Help with bufferbloat on R7800 using OpenWrt

I have a R7800 and installed OpenWrt 18.06.2. The problem I am having is bufferbloat still persists even when I have sqm qos on. Here are my settings:

Interface name: eth 0.2(wan,wan6)
Download speed (kbit/s) (ingress) set to 0 to selectively disable ingress shaping:
200000
Upload speed (kbit/s) (egress) set to 0 to selectively disable egress shaping:
18000
Queuing discipline: cake
Queue setup script: piece_of_cake.qos
Which link layer to account for: ethernet
Per Packet Overhead (byte): 22

My speeds with qos off:

Capture4

The modem I'm using is a SURFboard SBG6580 in bridge mode. I am not sure what else to do. Anyone have any suggestions?

Since I don't see the delay profile there, it's hard to know what "problem" you're still having.

I'm guessing you've got Comcast, which is DOCSIS 3.1 in the Bay Area, so that there is generally already PIE in place, though it looks like the SBG6580 may only be DOCSIS 3.0.

10% below unmanaged speeds is a good starting point for CAKE (which is roughly where you are). You may need to play with it a bit and decide where the bandwidth vs. peak latency tradeoff works for you. (There's no magic, you make things smoother by trading bandwidth for latency profile.)

Sorry, my score is probably a D or F on bufferbloat. It never shows my grades for some reason on the website. And yes the modem is DOCSIS 3.0. and my ISP provider is spectrum. Do I need to get a DOCSIS 3.1 modem? I'm in orange county. When I have qos on these are my results:

Capture5

My ping when the download is going spikes from anywhere between 400-1000ms. Here is a link that shows the ping graph: http://www.dslreports.com/speedtest/48494276

I don't know Spectrum, and had guessed Bay Area due to the selection of Fremont and San Jose. Unless they're offering DOCSIS 3.1, then changing your modem (assuming it isn't defective and I don't think its cursed with a Puma chip) probably won't make a difference.

I'm not getting the normal "emblem" from DSL Reports either, but a chart that is helpful (that you seem to have seen) is one like

image

400-1000 ms is pretty bad. Then again, 24 streams down seems excessive for a 200-mbps line. When I run the "Cable" test here it uses 16/16 down/up. I'd try backing down the upstream bandwidth some to see what that does.

I ran the test again using 16 streams and the grade it gave me was a 'C' with the qos settings that I posted above. http://www.dslreports.com/speedtest/48494977

I am clueless why it is so bad with qos on. Lurked for the last few days already looking for answers and finally decided to post on here cause I can't figure it out.

  1. Are you testing using an ethernet cable to your laptop/desktop or wifi? (use ethernet)
  2. What interface do you have sqm applied to?
  3. What are your speeds? goto https://www.nperf.com/en/ , click on server selection, and choose one closest to your location.
1 Like

Do a binary search on the download speed. Start with say 100mbps if that gives good results set it to 150 if that gives good results set it to 175 if not set it to 125, etc... Can you set download speed in such a way that it gives good results? if not, then there's something wrong

Ethernet cable to desktop

I'm not really network savy so I'm not entirely sure what you mean, quite a noob actually. I'm using luci-app-sqm and accessing it through 192.168.1.1

This is with sqm qos on using cake, piece_of_cake.qos

With sqm qos off:

I've tried this before. Tried everything between 20mbs to 210mbs the bufferbloat has been bad throughout. On an old n66u router I was using before that had a qos-bandwidth limiter I was able to set the bandwidth limit to my pc to 170mbs and I was able to get a bufferbloat score of B. I'm not sure if a bandwidth limiter is available on openwrt. Below is what it looked like:

Capture10

Would be cool if the sqm scripts could do this automatically.
I tried to make a shell script for that.
The idea was

  • update every x times
  • launch some sub shells
  • to monitor the bandwidth
  • start pinging some host on the internet
  • adjust bandwidth (with tc change) like you described until latency good.
    But I wasn't good sure if this will break some thing things in cake like the nat feature and I had no time testing around any further, so I dropped this.
    Maybe some day there will be an adaptive adjustment feature for both down and up.
    For ingress there is autorate-ingress maybe its worth a try?

How do I set autorate-ingress?

Nice idea, have a look at gargoyle's active congestion controller (https://www.gargoyle-router.com/wiki/doku.php?id=qos). Close to what you propose; it also relies on the upstream being non-congested...

just navigate to the SQM QoS tab in the LuCI GUI. Select the "Queue Discipline" sub-tab,

Check the box called:
"Show and Use Advanced Configuration. Advanced options will only be used as long as this box is checked."

Check the box called:
"Show and Use Dangerous Configuration. Dangerous options will only be used as long as this box is checked."

Add the following to the filed called "Advanced option string to pass to the ingress queueing disciplines; no error checking, use very carefully."

autorate-ingress

While you are at it, make sure this field also contains:
"nat, ingress" (in case you desire per-internal-IP-fairness also add "dual-dsthost")

and the filed called "Advanced option string to pass to the egress queueing disciplines; no error checking, use very carefully." contains:
"nat" (in case you desire per-internal-IP-fairness also add "dual-srchost")

That said, I am quite confident that autorate-ingress is not going to help. I would recommend to try the current master snapshots (make sure to also install the GUI).

I tried putting in auto-ingress, nat, dual-dsthost it didn't make a difference. Where do I go to get the current master snapshots?

Here is from cake's help (in the current master):

root@router:~# tc qdisc add root cake help
Usage: ... cake [ bandwidth RATE | unlimited* | **autorate-ingress** ]
                [ rtt TIME | datacentre | lan | metro | regional |
                  internet* | oceanic | satellite | interplanetary ]
                [ besteffort | diffserv8 | diffserv4 | diffserv3* ]
                [ flowblind | srchost | dsthost | hosts | flows |
                  dual-srchost | dual-dsthost | triple-isolate* ]
                [ nat | nonat* ]
                [ wash | nowash* ]
                [ split-gso* | no-split-gso ]
                [ ack-filter | ack-filter-aggressive | no-ack-filter* ]
                [ memlimit LIMIT ]
                [ fwmark MASK ]
                [ ptm | atm | noatm* ] [ overhead N | conservative | raw* ]
                [ mpu N ] [ ingress | egress* ]
                (* marks defaults)

So "autorate-ingress" it is, but I also predict "that autorate-ingress is not going to help".

hmm to me this implies it's not bufferbloat, at least not on your link, either you have a driver bug or similar, or you have an ISP that's congested in it's backhaul or something similar

Yes, it's the same logic.
But Gargoyles implementation is way too slow to find the suitable bandwidth.
https://www.cfos.de
Also uses a "pinger" approach, I think, but UDP based.
Sadly is only a standalone driver with an interface for windows.
And it completely depends on rule sets.
But it also has an adaptive mode.

Cakes autorate-ingress feature is also a bit slow.
For example, when I set 400 Mbit/s for the downstream and do a speed test it stays in the lower bandwidth range for several seconds, before it starts to ramp up.
And on the youtube debug window, where you can see your actual bandwidth, without autorate feature disabled it almost shows 100 Mbit/s+ and with autorate in shows speeds below < 15 Mbit/s.
Maybe because the stream actually doesn't need that much of a bandwidth and cake estimated quite right here...

And would be cool to have an option to set target manually,
so someone can adjust both target and interval.

I guess there is a tradeoff to be made, between timely response and massive oscillations and/or false positives.

I believe it aims at adjusting to reasonably slow changes of bandwidth, but in all honesty I never had to use it myself.

I guess the problem is that it needs to estimate the bandwidth from the arrival times of the incoming packets, so it will be partly driven by the characteristics of the incoming traffic. I believe the IQrouter guys implement something where they run RTT tests concurrently with speedtests, which allows them a better estimate of the achievable bottleneck rate, which for slowly changing bandwidth seems a better approach than relying purely on the "accidental" traffic patterns, but again, never used that either.

I believe this is on purpose not exposed, as according to codel theory (see https://tools.ietf.org/html/rfc8289#page-14 for details) target does not need to be exposed. That said, cake actually manipulates target for some of its priority tiers, which IMHO invalidates the rationale for not exposing it somewhat. That said, current cake does not allow to configure the different priority tins' interval independently, so not exposing target seems to be in line with the rest.

I don't have time to read through every one of these posts, but did you try connecting the pc directly to the modem and run the same test?

I found some CPU utilization problems with cake & piece_of_cake on my R7800 router. The default CPU frequency scaling really interfered with the SQM shaping.
Some notes over here in this thread:

I'm in Boston area with RCN cable modem with measured line speeds about 260/16.