Bufferbloat problems still with using sqm

Hello

I have sqm setup following the guide in the documentation. Unfortunately it has not fixed my bufferbloat issues. I would post a report here but dslreports for some reason is not measuring bufferbloat (the area is just blank??)

So to test I just put up a download that maxes my connection then ran some ping tests and launched a few online games. The pings were horrendous, literally unplayable! Here is the setup

Connection - dsl
Modem sync speeds - 3550kbps down/ 640 kbps up
Speed test - 2.97mbps down / 0.56mbps up
Modem - billion 7800 xl
Router - tp link wdr 4300 v1 (using latest lede firmware)

I wasn't able to get bridge mode to work on the modem so I just connected it as is to the router. Followed the guide instructions for using cake model and piece of cake. 44 byte overhead etc all that for dsl.

Not getting good results :frowning:

I tried limiting from 95% of my speed all the way down 50% but it still didn't help. I also tried different queue systems like fq codel, simplest qos and all other combinations available there. Also tried using a different overhead setting number.

There is no difference in using any of them. I have also tried using it on an interface other than the wan, but it really screws up the router hard (speed gets limited to like 0.55mbps download. It's weird)

Thanks in advance for the help.

Ah, let's see, maybe I can help undertsand what is going on.

I would need the following pieces of information:

  1. cat /etc/config/sqm

  2. tc -d qdisc

  3. tc -s qdisc

  4. which ISP you are using.

Weird, you culd try to run their command line interface (see https://www.dslreports.com/forum/speedtestbinary) to see how this works. (Side note on mobile browsers the bufferbloat plots do no show up at all, but will do with a desktop browser or if "request desktop site" is selected (firefox)) But just post a link to your dsl reports never the less it might help diagnosing why the bufferbloat plots seems missing...

This effectively means you are running under a double NAT situation, not ideal, but also not a real deal breaker (if your primary modem router allows to configure port redirects that is).

Ideally you actually try to empirically confirm the real overhead in use (see https://github.com/moeller0/ATM_overhead_detector/blob/master/ATM_overhead_detector.m. That said 44 is pretty conservative so this shouid not affect your actual issue.

Well at 50% it really really should; how did you assess that it does not work?

I would really recommend pice_of_cake/layer_cake for your use case.

Overhead too small will cause residual bufferbloat, especially with small packets, overhead to big will simply decrease your available bandwidth. Unfortunately setting the overhead to low can be masked by setting the shaper a bit lower...

Not really, the shaper really affects an interface's ingress (called downloading in the GUI) and egress (uploading in the GUI) configuration, but that direction is only aligned with internet down- and uploading on the wan interface, on all internally facing interfaces (LAN/WLAN) the interface directionality is reversed in relation to the internet; the 0.55 Mbps seem to match your upload speed from the speedtest with the shaper on the wan interface so probably reflects your egress shaper setting.

Let's see whether we can actually fix your situation...

Hey moeller0

Where and how do i get these information? I do everything from the web gui hehe. The ISP is Internode (Australia)

Also i got bufferbloat test working. It didnt work for firefox some reason had to use chrome

First one is at 50% of my download speed

Second one is at 95% of download speed

Looks like i got good results for bufferbloat but why was it different when doing a download test of my own?

I tried my own testing again, put up a download, youtube video, download from phone and that really bogged the internet down hard. Web pages were taking forever to load. Here is a shot of the graph when it was slow

Is there something wrong going on here?

SSH into the router and run from the command line.

I may have missed it, but are you running these tests over ethernet, or wireless?

It might also be helpful to see your network and wireless configurations...

cat /etc/config/network

cat /etc/config/wireless

Make sure to obscure the "option key" value(s) in the wireless config results before posting.

I suspect the Firefox issue had to do with an add-on (works fine for me on Firefox 56.0.2). Disable add-ons one at a time to see if one might be the cause.

Bridging to a modem is usually pretty straight-forward. Turn off DHCP, wireless, set Bridge mode to bridge. Cannot have the same IP as the router.

config queue
option debug_logging '0'
option verbosity '5'
option enabled '1'
option interface 'eth0.2'
option upload '500'
option qdisc 'cake'
option script 'piece_of_cake.qos'
option linklayer 'atm'
option overhead '44'
option download '2800'
option qdisc_advanced '1'
option squash_dscp '1'
option squash_ingress '1'
option ingress_ecn 'ECN'
option egress_ecn 'NOECN'
option qdisc_really_really_advanced '1'
option ilimit '2970'
option elimit '550'

qdisc noqueue 0: dev lo root refcnt 2
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc noqueue 0: dev br-lan root refcnt 2
qdisc noqueue 0: dev eth0.1 root refcnt 2
qdisc cake 8013: dev eth0.2 root refcnt 2 bandwidth 500Kbit besteffort triple-isolate rtt 100.0ms raw
linklayer atm overhead 44 mtu 2047 tsize 512
qdisc ingress ffff: dev eth0.2 parent ffff:fff1 ----------------
qdisc noqueue 0: dev wlan0 root refcnt 2
qdisc cake 8014: dev ifb4eth0.2 root refcnt 2 bandwidth 2800Kbit besteffort triple-isolate wash rtt 100.0ms raw
linklayer atm overhead 44 mtu 2047 tsize 512

qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 30819992 bytes 113892 pkt (dropped 0, overlimits 0 requeues 1)
backlog 0b 0p requeues 1
maxpacket 649 drop_overlimit 0 new_flow_count 1 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.1 root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 8013: dev eth0.2 root refcnt 2 bandwidth 500Kbit besteffort triple-isolate rtt 100.0ms raw
Sent 6128337 bytes 27559 pkt (dropped 150, overlimits 13648 requeues 0)
backlog 0b 0p requeues 0
memory used: 124992b of 4Mb
capacity estimate: 500Kbit
Tin 0
thresh 500Kbit
target 36.4ms
interval 131.4ms
pk_delay 37.7ms
av_delay 2.5ms
sp_delay 80us
pkts 27709
bytes 6307106
way_inds 196
way_miss 1138
way_cols 0
drops 150
marks 0
sp_flows 0
bk_flows 3
un_flows 0
max_len 1696

qdisc ingress ffff: dev eth0.2 parent ffff:fff1 ----------------
Sent 17335941 bytes 26457 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0 root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 8014: dev ifb4eth0.2 root refcnt 2 bandwidth 2800Kbit besteffort triple-isolate wash rtt 100.0ms raw
Sent 20318928 bytes 25665 pkt (dropped 792, overlimits 20161 requeues 0)
backlog 0b 0p requeues 0
memory used: 67456b of 4Mb
capacity estimate: 2800Kbit
Tin 0
thresh 2800Kbit
target 6.5ms
interval 101.5ms
pk_delay 32.8ms
av_delay 15.5ms
sp_delay 249us
pkts 26457
bytes 21648433
way_inds 0
way_miss 1141
way_cols 0
drops 792
marks 0
sp_flows 0
bk_flows 1
un_flows 0
max_len 1749

Also yes i do all my tests on ethernet

I think i know whats the problem here, but not sure of the solution.

The speedtests never seem to push towards the modem sync speed. Which is why the bufferbloat appears to be low there showing the A rating. However if i put up downloads and start doing more bandwidth demanding things on other devices also, the speeds look like they reach almost near the modems sync speed as seen in the screenshot. This is where the internet really slows down to a crawl.

DSL Reports says your sync speed is 2651 down and 366 up.

Your speed test results are pretty close to sync.

If you're syncing at half speed...you likely have a line issue.

Are you selecting the servers for the DSL Reports test, or is that the default they are giving you?

Is your network quiet when you're doing these tests?

So one thing to consider is that your link is really slow, sending a packet upstream takes around:
1000 * (32538) / (6401000) = 21.2 ms
and on downstream:
1000 * (32
538) / (35501000) = 3.8 ms
so on a fully loaded link (in both directions) you will see an average latency under load increase of >= 25ms. SQM can not really do wonders if there is a chronic bandwidth under supply...

That looks pretty dire, especially the uplink; I guess just having a single flow on the uplink is just painful...
But mostly the quality score "F" seems cause for concern.

I have a hunch that per-interal-host-IP isolation might help a bit by at least trying to isolate the individual hosts a bit better (but with your up- and downlinks you can expect no wonders...)

Thanks for helping moeller0

Indeed the line is very slow, which is why I want to squeeze out what little performance I can to keep it as smooth as can be.

Would you be able to answer what is sending the connection into overdrive? I have clear rules set to limit it to 2900kbps but if you look at the performance screen shot it is going well over the limit. Something is perhaps overriding it?

Great, let's see how far we can get this then. Could you add the following to /etc/config/sqm:

option linklayer_advanced '1'
option tcMTU '2047'
option tcTSIZE '128'
option tcMPU '64'
option qdisc_advanced '1'
option ingress_ecn 'ECN'
option egress_ecn 'NOECN'
option qdisc_really_really_advanced '1'
option iqdisc_opts 'nat dual-dsthost'
option eqdisc_opts 'nat dual-srchost'
option linklayer_adaptation_mechanism 'cake'

That should give you per-internal-IP isolation which should be a bit easier to understand than the default triple-isolate. Also, if you have not already done so, follow the instructions on https://github.com/moeller0/ATM_overhead_detector/blob/master/ATM_overhead_detector.m to empirically deduce the real overhead on your link (please post the two resulting images here in this thread).
One more question, to your knowledge is your ISP using PPPoE or PPPoA?

Once we have the overhead accounted for correctly, I would propose to try to up the egress rate up to 99% of the modems sync rate (which will work unless your ISP also uses a shaper at the BRAS/BNG level) and then iteratively try to figure out a decent downstream shaper setting.

Best Regards

I would not read too much into the real time graphs, at least not at the resolution of individual datapoints/spikes

I think I would start looking at DSL line stats and diagnosing the line issues before doing any more LEDE changes...

http://www.kitz.co.uk/adsl/linestats_errors.htm

I have done the first part for /etc/config/sqm:, but i am not sure of the atm overhead detector you are talking about. There are so many lines of command and lingo i have never heard of. I am not sure what to do :frowning:

And also my ISP uses PPPoE

Ah sorry, I linked to one of the code files accidentally, I had intended to link to https://github.com/moeller0/ATM_overhead_detector which should give reasonable instructions on how to accomplish the measureents instead. Maybe that will be actually useful.

Okay, that already restricts potential values for the overhead...

That is important. When packets need to be re-sent due to line errors, there is severe latency. Also the speed is uncertain. It's never going to work very well. Raw speedtests (direct to the modem with no other usage) should show the rate you are subscribed for.

Yeah I have tried many settings for the overhead thing. There is no difference.

If I put a download on it just kills the internet pretty much. Is there possibly a setting within the firmware to automatically detect downloads and make sure that it doesn't use more than 90% of the bandwidth?

I have no doubt there would be line errors. Its aussie internet for ya. Being 4KM in cable length to the exchange doesn't help the situation either.

Oh, then there is something wrong, underestimating the overhead (and/or overestimating the bandwidth) should increase the measurable bufferbloat under saturating loads noticeably; but I guess since you already suffer from bufferbloat nothing would change from your perspective (maybe the underlaying root cause of the observed bufferbloat).
But @jwoods seems to be on to something, how about you have a look in your modems error counters before ans after an extensive speedtest. By the way both of your speedtests show an extreme number of retransmits that might explain the observed behaviour somewhat. Does your ISP use G.INP by any chance?

I dont think so, thats a type of line profile isn't it?

On that subject, do you think my line profile could have an affect? Internode lets me use different profiles from their website letting me choose how i want to use it. They range from "very reliable" to "very fast" and also some "low latency" profiles. They also come in ADSL or ADSL2+ types.

I currently use ADSL2+ low latency. Low latency profiles turn off interleaving. Could that be an issue?