Okay, that pretty much looks like starvation, I agree (for what it is worth, I am no expert in this field).
Try installing iftop and look at it to get an idea what is causing these upstream spikes.
I guess the issue is that this phone is massively more latency sensitive than the other applications. You could run mtr against say the SIP server address to see how bad the cyclic delay increases actually are.
Sure, the bigger question is how difficult would it be to do so. How about you switch from piece_of_cake to layer_cake first? This will introduce a higher priority "tin" which if it has enough bandwidth might be sufficient to keep your voice calls audible from the outside.
If you switch to layer_cake, please post the output of tc -s qdisc so we can check how much bandwidth is reserved for high-priority traffic.
You could alternatively try to switch to per-internal-host-IP fairness, but that will only work if you have <= 7 concurrent computers active (I simply assume that a voip call requires around 100Kbps per direction... Let's try layer-cake first. In the unlikely event that your Cisco SPA112 already uses standard conforming DSCPs on its packets that might already fix you issue (but I would consider that to be very unlikely) But see page 8 of https://spiderwebsolutions.com.au/wp-content/uploads/2017/09/Cisco-SPA112-and-SPA-122-Simple-Configuration-Guide.pdf which seems to show how to configure DSCPs for your Cisco device....
Sure, but it comes with the same configuration challenges as increasing/guaranteeing the bandwith for the cisco ATA... (and layer_cake also has something up its sleeve for this case the bulk "tin" for background/low priority traffic which will yield to higher priority traffic quickly).
Well, traditional QoS systems give all the freedom to configure every last detail manually, but the goal of sqm is to pretty much work out of the box without the need for much configuration. But sqm can basically be used to run self-made qos scripts and hence still allows fine-detailled QoS schemes it just does not offer them out-of-the-box....
DSL should operate at exactly the speed of your plan. If you're not getting the advertised speed you need to investigate with the phone company. I don't know exactly what the quality numbers mean but any SES at all seems bad. If you talk to the phone company they can check what their equipment is receiving.
Don't trust old phone wiring in the house. You should have a direct line of modern twisted pair cable (cat5e or at least cat3) from the phone company demarcation to your modem.
SQM doesn't do anything unless it is set for lower than the actual speed imposed by the line hardware or the rate limiter at the ISP end. The trick is of course to get it just slightly lower.
Excellent news, let's try the layer_cake route! BTW your sync is pretty high for your shaper settings, so could you say perform 3 dslreports speedtests in a row with SQM disabled and post them here? I have a hunch that we might be able to get decent bufferbloat results with higher usable bandwidth.
I can't access these setup. Only OVH does. After initializing the Cisco-SPA112 with default username/password, I've been logged out and only OVH can log in.
Sure, here are three test I did before enabling SQM and with only one computer connected to my modem/router and with one web browser page open.
I did 10 test in a a row to reach to an average upload of 8800 kbit/s (95% make it 8400) and download of 810 kbit/s (95% make it 770).
here is the list of the 10 tests before enabling SQM
and here are 10 test I did after enabling SQM, when I was testing if overhead should be 14,13,12,11 or even 10 and as you, in the last one with overhead at 10, I have what seems to be good result (for what I understand of them).
I'm going to try the SIP phone with the server connected now that I have changed to layer_cake and I'll come back to let you know the result of my tests.
That would work (you might need to run opkg update ; opkg install mtr first) or you can run it from linux/macos machines, for windows there is https://sourceforge.net/projects/winmtr/.
So 192496/1000 = 192.496 Kbps, that should be sufficient for roughly two concurrent VoIP calls with the usual codecs, so I believe this might actually work out...
Fair enough, or rather unfair enough. But try on your router the following:
then perform a voice call (not too long so the capture file does not get too large please)
press control-C on the router shell to stop tcpdump.
then copy the resulting capture file /tmp/pppoa-VoIP-test.cap from your router to a computer (using say scp or winscp), then open this file with wireshark and look at potential dscp markings on packets send by the cisco device. Please report your results here.
Great, so now these average values are net goodput while you should feed the shaper with gross rates.
So I would try the following (and then test and potentially reduce the rates until the bufferbloat plots looks acceptable):
8800 * 53/48 * (1510/(1500-20-20)) = 10049.42 * 0.9 = 9044.478 Mbps -> try 9000 Kbps
810 * 53/48 * (1510/(1500-20-20)) = 925 * 0.99 = 915.75 Mbps -> try 900 Kbps
Since for the upload you control the access to the bottleneck link you should be able to get really close to the real link's capacity than on ingress.
First long attempt of 14 min call without any issue. Yeah! Long live layer_cake! Well, before conclusion I'd rather wait few days and make several phone call to be sure it is indeed perfectly working.
Ok, I'll try that once I fell the SIP phone is really working. So you say that I should put 9000/900 instead of 8400/770 and then reduce till "bufferbloat plots looks acceptable". What means "bufferbloat plots looks acceptable"? How do I know that?
That is the right approach! Give it some testing before declaring mission accomplished.
These should be independent of the SIP phone, or rather the increase in the upload direction should improve the situation somewhat. But again treating this as orthogonal to the SIP issue sounds like a decent plan.
Yes.
Make a dslreports speedtest and then go to the detailed results page, in the grades section you see the bar graphs for idle downloading and uploading, if you click the links under the 3 bars you will see all results from the individual latency probes, if they all stay nice and low, it would be acceptable to me; but your acceptance threshold might be different so just look at it yourself I would say the plots in https://www.dslreports.com/speedtest/45499092 certainly show acceptable bufferbloat for reference, at least for my taste; so if the results with 9000/900 look very similar -> mission accomplished ;).
Well, while it would be interesting to learn which DSCP marks your SIP traffic actually uses, it is less urgent as long as layer_cake really solves your problem.
So let me propose a simpler test:
tc -s qdisc, then do a call and immediate after do tc -s qdisc again, we should then see an increase in traffic in the counters for the egress voice tin...
So it looks like your link is severely overloaded here, all worst and average RTTs increase to unhealthy levels (>100ms) after your modem.modem, I can see how voip could be affected by this (but only latency wise, I still fail to understand why your calls get silenced by this).
I discovered that when I did it mtr, someone was calling at the same time. As I'm imagining it could affect the result, here is a new one when nobody is calling.
These are the relevant counters from the three? calls to tc -s qdisc (why three?)
pkts 0 4120047 33942
pkts 0 4131112 35104
pkts 0 4132309 35129
Assuming the first and last to be relevant I see:
4132309-4120047 = 12262 best effort packets
and
35129-33942 = 1187 packets in the Voice tin
with voip typically sending at 50 packets per second (see e.g. https://www.cisco.com/c/en/us/support/docs/voice/voice-quality/7934-bwidth-consume.html)
this would be 1187/50 = 23.74 seconds worth of a voip call, does this sound reasonable?
Anyway this does show that there is some usage of those dscp marks that get moved into the high priority voice tin.
Take you time, if you problem is fixed by layer_cake, maybe you do not need to measure this at all, really I am just curious and my curiosity will subside eventually
I don't know because I restarted the router recently to make tests so I couldn't notice any new disconnection. If it happens, as you seem to be interested, I'll let you know.
Since in the end it seems the bigger issue was the phone I was thinking maybe to suggest an edit to the post title so others with similar phone issues might find it easier in searches...
Oh, I see. It's true the subject has changed along the conversation and indeed, the SIP phone was a bigger issue and find its solution (it looks like it did so far).
So, yeah, sure, I'll change the title.
OK, phone is working well since I changed to cake_layer. So, thank you so much @moeller0!
I did it and I even tried to increase them +100 at a time to reach the couple 9800/1000. Results looks good to me even though sometime, the first bufferbloat sample for upload is a bit higher (but it was already the case when I was at 8400 so I guess it's ok).
If I increase to 10000/1100 then bufferbloat's samples on graphs become crazy so I reached the very maximum.
This one is good
but this one is less good (first sample for upload)
but of the five test done, none are worse than this one so I assume it's ok. Can you confirm?
This is a policy question, so you need to make this decision. I probably would go for 9000/1000 just to be prepared for multiple incoming bulk flows; on the other hand you did exactly what I recommend, iteratively probe your personally acceptable trade-off between latency-under-load-increase and bandwidth sacrifice. So if you are happy with 9800/1000 then by all means go for it. I note that the initial download spike is a consequence of setting the download value relative high, so the backspill of packets into the upstream buffers gets noticeable even for only 8 concurrent streams.
Great, merci! Just select one of the packets of this flow in wireshark and look at voip session packets (most likely UDP) and search in the internet protocol header for:
Differentiated Services Field: 0xb8 (DSCP: EF PHB, ECN: Not-ECT)
1011 10.. = Differentiated Services Codepoint: Expedited Forwarding (46)
.... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
and search for a SIP packet and also look in the internet protocol header for:
as you can see in my case both are not using the default value of 0, I note that cake's diffserv3 will treat AF41 like 0, but the critical EF marking for the actual VoIP data packets end up in the high priority voice tin.