SQM & bufferbloat advice/help

I have a crummy ADSL line here in the UK with horrendous bufferbloat which without SQM makes the internet basically unusable:

With SQM configured with cake, bufferbloat is much better and makes the broadband usable:

My questions were as follows:
SQM is already doing an amazing job and the average latencies are way down but there is still a spike in latency right at the beginning of the speedtest - is there anything that can be done about this.

My property is going to be upgraded to FTTP finally probably sometime in 2021 but I am looking for a new broadband package as my contract is ending and the price is being increased as a result. Is the bufferbloat on my line likely to be caused by Openreach's infrastructure or isp? I'm trying to understand whether changing isp will be of benefit or not.

Lastly, we make a load of international calls and copper line international phone packages are on the whole pretty expensive. I was thinking to switching to VOIP as FTTP is coming sometime this year anyway. Would VOIP be useable with the above latency or should I avoid until I get FTTP?

Many thanks for anyone who can help with some advice.

What kind of qdisc did you configure?

If you chose CAKE, did you set the Link Layer adaptation value?

What is the guaranteed by your ISP throughput (download and upload)?

What did you set your download and upload bandwidth?

How many simultaneous phone calls do you have going (maximum)?

Thanks for the response.
It's currently set to cake and piece of cake.
Isp estimate: 1-3 mb/s - guaranteed 1.3 mb/s
Download and upload in SQM settings set to 2.8 mb/s and 0.7 mb/s.

Maximum number of phone calls would be 2. But it definitely would need to work well enough so that 1 person can be using the internet for work and another making a phone call.

This is unavoidable due to the way TCP works.

SQM can only drop packets "for real" in the upload direction. It can thus guarantee that your hosts, when sending data, will not overload the link. For the download direction, any attempt to drop a packet is, conceptually, too late - it has already been received, and it has already stood in the queue, and it has already delayed other packets. So SQM only pretends that the incoming packet has been dropped, by not letting the kernel process it. Anyone can still flood your public IP address with many unwanted packets and cause a queue to build up on the ISP side.

Ingress shaping only "works" for TCP connections because of the flow control mechanism that all compliant TCP implementations implement. That is, very roughly: the sender keeps sending faster and faster, and listens for the receiver to acknowledge the receipt of each segment. When the receipts (ACKs) stop coming, the sender knows that it has stressed the network too much, and backs off. But it's already too late: these over-stressed packets have been placed in the queue on the ISP side, delaying everything else, just like in the UDP case. In essence, even on a 50 Mbit/s connection, start of every TCP flow involves the sender sending data at 70 Mbit/s for a short while, thus overloading the link.

Therefore, your only choices are either to ignore the problem or to shape the line even further.

3 Likes

Thanks for the explanation, makes more sense now. Do you think changing ISP would make any difference whatsoever keeping in mind it would still be over the same copper line?

Lastly, do you think the above latency would be acceptable for VoIP or do you think I will have to further shape the line.

I think this should be good enough for VoIP but not necessarily for games (this one would be a good test: https://littlebigsnake.com/). I would rather not try to guess whether changing an ISP would help. And it doesn't cost much effort to try shaping the line a bit further and looking whether there is any benefit.

1 Like

Thanks for your help. I've just got to find the right balance as if I shape too much more then my already low download speeds will be even slower, albeit with lower latency.

It depends, as always ;). Rate-wise I see no issue, a VoIP stream takes around 100kbps in each direction. Latency-wise is slightly more complicated. Typically people are relative tolerant to some delay in a conversation, but the larger that delay gets, the more awkward it gets. IIRC around 100 to 150 ms end-to-end delay is something people can learn to accept above that it gets less natural/comfortable. Compared to old analog telephony your 20ms initial delay now will reduce the distance up to which phone calls will feel smooth and direct. But I do not think that for calls inside the british islands you will.actually notice a switch from POTS (plain old telephony service) to VoIP, unless your VoIP provider would be exceptionally incompetent.
But note, that if your link shows high numbers of CRR errors, these can cause pops and missing parts in VoIP calls.
Personally I switched to VoIP 2013 and am not missing anything, but YMMV....

2 Likes

All good points. Typically, non-TCP transports also try to adjust to the available capacity and hence react to the same signals as TCP flows, only difference being, that for TCP that mechanism, called congestion controll, comes as part of the protocol and hence is handled by the OS's TCP stack, while e.g. Each UDP using application will need to implement its own CC mechanism.And not all non-TCP applications do this properly or at all, but many do.
And SQM will not only delay packets and hence delay the ACK feed-back that in turn effects the sending rate (I am simplifying here a bit), instead SQM will also drop carefylully selected packets, as most transport protocols interpret (consistent) packet loss with network congestion and reduce their transmit rate to adapt to the expected network capacity. That is a clearer signal than delayed ACKs and to be honest SQM will only actually delay ACKs if there is a queue built up in the direction carrying the ACKs.

1 Like

Thanks. I think I'll give it a go then and see how it goes!

+1; the way e.g. TCP works is that when the signal of overload (an ACK signaling that at least on packet was missing) reaches the sender, that sender has already been sending too much data for around 1 RTT worth of time, and this burst of packets is going to find its way into the ISP side of your access link. That side typically has over-sized, but under-managed buffers, and so the packets will collets there and build up a queue, which due to most simple buffers being FIFO (first in first out) will delay all other packets, like the latency probe packets of the speedtest.
SQM's trick is to try tp avoid that situation by sending flows the slow-down signal earlier (and cake offers an "ingress" keyword which will make it a bit more aggressive for incoming flows, by taking dropped packets into account, which for egress are simply ignored). But that is approximate, if packets rush into the ISP end of your link faster than the link can carry they will accumulate, and in your speedtest, there where four simultaneous flows per direction (so 4 fu flows, and 4 reverse ACK flows, plus what ever else causes traffic on your link). The proper solution to these spikes would be to instantiate cake for ingress on the ISPs side of the link, which unfortunately no ISP seems to offer.

1 Like

One thing you need to do is switch your CAKE to Layered CAKE because it’s more geared toward prioritizing VoIP over other flows but not allowing the priority traffic to cannibalize the entire bandwidth.

You have very little bandwidth to play with. Layered CAKE allocates 25% to the priority queue. You barely have enough bandwidth for two calls if you want CAKE to manage your queues.

Another thing you need to do is to make sure that the IP phones mark their packets correctly with DSCP 46 (EF), and if you have other equipment between the IP phones and the OpenWRT router, you need to make sure that the equipment doesn’t zero out or otherwise modify the IP Phone QOS DSCP markings.

So, if you are going to get the “IP PBX in the cloud” kind of service, which is what I believe you are planning to do, you have two choices:

  1. The VoIP provider that ships pre-configured IP phones to you. Usually, you can’t reconfigure these IP phones. In this case, you must make sure that the IP phones are marking voice packets with DSCP 46 (EF). So, talk to the provider to make sure this is the case before you buy their service.
  2. The VoIP provider that allows BYOD (bring your own device). In this case, you must make sure that the IP phones or (ATAs - analog telephone adapters) you select are capable of being configured to mark voice packets with DSCP 46 (EF). And when you configure them, you must make sure that you tell them to mark voice packets with DSCP 46 (EF).

The third choice is for you to have a small PBX on premise and purchase a SIP trunk from an ITSP (Internet Telephony Service Provider). This is usually a more complicated route but it provides significant savings. However, with only two phones, the savings may not be worth the effort.

Edit:

Also, you mentioned that the guaranteed upstream bandwidth is 700 Kbps. Layered CAKE polices the priority queue at 25% of the configured bandwidth.

700 Kbps * 0.25 = 175 Mbps. One G.711 call consumes 80 Kbps (RTP payload + IP header + RTP header + UDP header) without factoring in any Layer-2 overhead. So, without the L2 overhead, you barely have enough Layered Cake priority queue bandwidth for two G.711 calls. You have fewer than 20 bytes of available L2 overhead left in each RTP packet before you hit the 175 Kbps of the bandwidth limit in the priority queue. Once that limit is exceeded, CAKE will start tail-dropping packets out of the priority queue, and people on the other end of the phone call will experience dropped sounds or even dropped syllables.

Cisco recommends to set the priority queue to no more than 33% of the total bandwidth. If we could do this in CAKE, you would have 231 Kbps limit in the upstream priority queue, which would give you 88 bytes per packet available for the L2 overhead in each of the two simultaneous calls before you would hit the priority queue bandwidth limit of 231 Kbps. The 88 bytes of L2 overhead per packet is sufficient for any L2 protocol, including your flavor of DSL with PPPOE. Unfortunately, to the best of my knowledge, the 25% reservation of the total bandwidth as the priority queue bandwidth limit cannot be modified in CAKE.

Chances are that CAKE would be dropping voice packets if both G.711 calls are active in your environment. Therefore, you should also inquire about the availability of the G.729 codec with the VoIP provider, as it may be necessary to run one or both phones at G.729 instead of G.711. The selection of the codec (G.711 vs G.729) would have to be decided on the empirical basis (aka trial and error).

1 Like

You didn’t answer the question about the Link Layer Adaptation setting.

Only if a) your ISP properly marks the incoming packets... and if the bandwidth reserved for the high priority tin is larger than 100Kbps, otherwise cake will essentially ignore the dscp and schedule the packets into the lower priority tins. But I agree, it is worth trying layer_caker over piece_of_cake here, but I am not certain that one would be theoretically better.

Two concurrent calls, which honestly seems like an acceptable trade-off, no?

1 Like

+1; @tombadog could you post the output of the following three commands, issued from a ssh shell on the router please:

ifstatus wan
cat /etc/config/sqm
tc -s qdisc
1 Like

700 Kbps * 0.25 = 175 Mbps. It’s not enough for two calls if they are G.711 calls - as long as CAKE manages the queues because CAKE polices the priority queue at 25% of the total bandwidth.

Chances are CAKE would be dropping voice packets if both calls are active.

So, the original poster should also inquire about the availability of the G.729 codec with the VoIP provider, as it may be necessary to run one or both phones at G.729 instead of G.711. I’ll update my previous post with this info.

1 Like

Your a) assumption is incorrect. The VoIP packets need to be scheduled into the priority queue in the egress (out to the Internet) direction. The marking of the VoIP packets in the ingress (from the Internet) direction by the VoIP provider is also desirable but most likely outside of the OP’s control and also is not as critical, as the OP has three times the amount of download bandwidth as he does the upload bandwidth.

However, the OP can control the scheduling of the VoIP packets in the egress (out to the Internet) direction into the priority queue by making sure that the VoIP packets arrive in the OpenWRT box properly marked with DSCP 46 (EF). Because the upstream bandwidth is only 700 Kbps, it’s extremely important that the VoIP packets are scheduled into the priority queue in the egress (out to the Internet) direction, so much so that without the VoIP packets being scheduled into the priority queue in the egress direction, the IP calls will most likely be unusable as long as the OP consumes any Internet-based content or sends email while on the phone.

Thanks for all the info.
Sorry I forgot, Link Layer Adaption is set to ATM.
I was planning to use my current DECT phone with an ATA so thanks - I can check for the marking capability before purchasing.

Codecs wise the VOIP provider supports both those codecs so I should probably check if any potential ATAs supports these as well?

Please set the overhead to 44. Overhead 0 is certainly wrong....

1 Like

Yes, make sure the ATA(s) support G.729 and also make sure that the ATAs have a setting to mark VoIP (RTP) packets as DSCP 46 (EF).

Do you have an Ethernet switch? Is it managed? If it is, make sure it doesn’t overwrite QoS markings. The easiest thing is to turn QoS off in the switch completely.

1 Like