SQM Optimal Settings For My DSL

Hello, I have done the ATM Overhead Detector test and here are the results


i did use the 40 bytes overhead but i still get occasional bufferbloat/spikes, is there any more options that i should turn on for my dsl ? let me mention that i get better bufferbloat results (less massive spikes) if i use None in link layer adaptation even though my line is indeed ATM enabled

Ermm, could you repeat the ping_collector step with SWEEP_N_ATM_CELLS=4, it would be really helpfuk to see more than one step transition in the data... (the default should be SWEEP_N_ATM_CELLS=3, so I am puzzled why we only see this little amount of data; could be a change to the script or a bug)...

Could you post the output of:

  1. cat /etc/config/sqm
  2. tc -s qdisc
  3. tc -d qdisc
  4. ifstatus wan

please.

config queue 'eth1'
    option interface 'eth0'
    option debug_logging '0'
    option verbosity '5'
    option qdisc 'cake'
    option qdisc_advanced '1'
    option squash_dscp '1'
    option squash_ingress '1'
    option ingress_ecn 'ECN'
    option egress_ecn 'NOECN'
    option qdisc_really_really_advanced '1'
    option iqdisc_opts 'nat dual-dsthost'
    option eqdisc_opts 'nat dual-srchost'
    option linklayer 'none'
    option script 'layer_cake.qos'
    option download '3000'
    option upload '640'
    option enabled '1'
qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 8016: dev eth0 root refcnt 2 bandwidth 640Kbit diffserv3 dual-srchost nat                      
nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0
Sent 42353472 bytes 356890 pkt (dropped 734, overlimits 167240 requeues 0)
backlog 0b 0p requeues 0
memory used: 103296b of 4Mb
capacity estimate: 640Kbit
min/max network layer size:           42 /    1474
min/max overhead-adjusted size:       42 /    1474
average network hdr offset:           14

               Bulk  Best Effort        Voice
thresh         40Kbit      640Kbit      160Kbit
target        457.8ms       28.6ms      114.5ms
interval      915.6ms      123.6ms      228.9ms
pk_delay          0us       12.8ms       12.8ms
av_delay          0us        1.1ms        2.1ms
sp_delay          0us         14us         10us
backlog            0b           0b           0b
pkts                0       356690          934
bytes               0     43127143       126892
way_inds            0         1208            0
way_miss            0         2134           20
way_cols            0            0            0
drops               0          734            0
marks               0            0            0
ack_drop            0            0            0
sp_flows            0            0            0
bk_flows            0            2            0
un_flows            0            0            0
max_len             0         1474          590
quantum           300          300          300

qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------
Sent 498851941 bytes 485270 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev wlan0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target                      
5.0ms interval 100.0ms memory_limit 4Mb ecn
Sent 8727637458 bytes 7470835 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 1474 drop_overlimit 0 new_flow_count 284 ecn_mark 0
new_flows_len 1 old_flows_len 6
qdisc noqueue 0: dev br-wan root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc noqueue 0: dev br-lan root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 8017: dev ifb4eth0 root refcnt 2 bandwidth 3Mbit besteffort dual-dsthost n                     
at wash no-ack-filter split-gso rtt 100.0ms raw overhead 0
Sent 477846965 bytes 465146 pkt (dropped 20124, overlimits 821746 requeues 0)
backlog 0b 0p requeues 0
memory used: 216Kb of 4Mb
capacity estimate: 3Mbit
min/max network layer size:           60 /    1474
min/max overhead-adjusted size:       60 /    1474
average network hdr offset:           14

                  Tin 0
 thresh          3Mbit
target          6.1ms
interval      101.1ms
pk_delay        2.2ms
av_delay        178us
sp_delay          6us
backlog            0b
pkts           485270
bytes       505645721
way_inds         2492
way_miss         2158
way_cols            0
drops           20124
marks               0
ack_drop            0
sp_flows            4
bk_flows            1
un_flows            0
max_len          1474
quantum           300

qdisc noqueue 0: dev lo root refcnt 2
qdisc cake 8016: dev eth0 root refcnt 2 bandwidth 640Kbit diffserv3 dual-srchost nat
nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0
qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------
qdisc fq_codel 0: dev wlan0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
qdisc noqueue 0: dev br-wan root refcnt 2
qdisc noqueue 0: dev br-lan root refcnt 2
qdisc cake 8017: dev ifb4eth0 root refcnt 2 bandwidth 3Mbit besteffort dual-dsthost nat wash no-ack-filter split-gso rtt 100.0ms raw overhead 0

{
    "up": true,
    "pending": false,
    "available": true,
    "autostart": true,
    "dynamic": false,
    "uptime": 70398,
    "l3_device": "br-wan",
    "proto": "dhcp",
    "device": "br-wan",
    "updated": [
            "addresses",
            "routes",
            "data"
    ],
    "metric": 0,
    "dns_metric": 0,
    "delegation": true,
    "ipv4-address": [
            {
                    "address": "192.168.1.2",
                    "mask": 24
            }
    ],
    "ipv6-address": [

    ],
    "ipv6-prefix": [

    ],
    "ipv6-prefix-assignment": [

    ],
    "route": [
            {
                    "target": "0.0.0.0",
                    "mask": 0,
                    "nexthop": "192.168.1.1",
                    "source": "192.168.1.2\/32"
            }
    ],
    "dns-server": [
            "1.1.1.1",
            "1.0.0.1"
    ],
    "dns-search": [

    ],
    "inactive": {
            "ipv4-address": [

            ],
            "ipv6-address": [

            ],
            "route": [

            ],
            "dns-server": [

            ],
            "dns-search": [

            ]
    },
    "data": {
            "leasetime": 86400
    }
}

just saw your ping_collector message and yeah i was running on default values, ill rerun it with SWEEP_N_ATM_CELLS 4

i changed my ADSL modulation to ADLS2+ (from ADSL) and these are the new results


going to rerun now with SWEEP_N_ATM_CELLS 4

here are the results for SWEEP_N_ATM_CELLS=4


it only produced a 12mb file btw (all tests i ran only produce <20mb, is this fine?)
this is what the command shows when executed

here are the overhead script output as well

If you are on an real ATM link (ADSL1, ADSL2, or ADSL2+) you really need to set this to ATM, no ifs and buts.

Yeah, that can be painful, especially the Uplink.

Ah, I see you used the windows version of the collector-script, that sees very little testing (I only have windows in a VM and I rarely start that up); I guess I should look inside that script again to figure out why it only seemed to have tested for around 50Bytes instead of 48*3 = 144. About the size of the files, no real idea either, I would guess that longer runtime should result in larger files, but without seeing the files I can only wildly speculate which is not going to help anybody.

Anyway, that plot now strongly indicates ATM cell encapsulation with an overhead of 40 Bytes.

You might want to add "ingress" to the list of cake keywords in the iqdisc_opts field, to be more robust against multiple concurrent ingress flows; that said at 3000/640 this is not going to make massive improvements.

BTW, what modem do you use and what statistics does your modem report? Is that modem running in bridged mode or did you have to configure a username and password (for PPPoE perhaps) on that modem?

i'm running Openwrt on a raspberry pi 3b, router is in pppoe mode wifi off and the raspberry pi connected on lan port 1 which has DMZ enabled the pi is the only AP available with SQM enabled on it, by statistics do you mean the speed (4096 down/1024 up if so) ? or full connection statistics ? would it be better if i use a modem instead and do the PPPoE connection in OpenWrt ?

Preferably the full connection statistics (and if possible the name and version of the modem).

That depends, if the Pi does the PPPoE decapsulation it will be able to detect connection hangs, has more freedom to hand out DHCP addresses and/or you do not need double NAT. But for your ping spike issue, this change will not necessarily help...

Screenshot_26 Screenshot_27

the ISP router is zxhn h108n v2.5

Thanks, now the next interssting piece of information would be the result of a dslreports speedtest, with SQM disabled and with SQM enabled. See https://forum.openwrt.org/t/sqm-qos-recommended-settings-for-the-dslreports-speedtest-bufferbloat-testing/2803 for thoughts how to configure the test and how to report the results here in the forum.

The error counters, which I hoped to look at in detail are unfortunately a bit too terse (though 0/104 CRC errors is really low, unless the uptime of the link was well below one hour, but you are at 21915/3600 = 6.1 hours already, so the link seems to be clean).

SQM off http://www.dslreports.com/speedtest/52480349
SQM on http://www.dslreports.com/speedtest/52480584
the speeds are fluctuating because im not the only person using the internet right now. but still even with a bufferbloat rating of A if i play a game and then try downloading something on a different device i still get huge lag spikes, i even get A+ rating sometimes but keeps lagging as well

These seem to be two copies of the same link, I would guess with SQM active.

Fair enough, makes interpretation a bit harder, but not by much. (Except the SQM off test would be best with no other traffic active).

Do you have data that shows this? I guess one issue is that at 3000/640 you might be really bandwidth starved, and then all SQM can do is move the pain around...

Silly question, there is no other device connected to the ZTE router, only the pi3b?

sorry about that, fixed

data like ? sometimes it works fine with only 5-10ms lag added but recently its been so bad

and yes there are no other devices connected to the router

Thanks, it is clear to me that SQM really does help your link to be usable under load. Now, lets try to tackle the ping spike issue.

Maybe a packet trace or the output of an mtr/winmtr session running during the experienced latency spike?

How bad exactly? And one thing to try when things run badly, would be a) a dslreports speedtest, and a dslreports speedtest after halving both the ingress and egress shaper rate (to check whether the issue might be related to your DSLAM's uplink, in that case sqm on your router will need to be set to the lowest reliably reachable shaper rates), Also it would be good to monitor the modem's error counters, line instabilities and CRC errors will cause havoc to your data transfers.

But in reality all these hypothesis are rather weak, but without alternatives maybe still worth researching.

SQM is definitely doing something but the problem is, i get way better results with None link layer adaptation than with ATM 40 bytes overhead

I see, could you run top -d 1 during one of your tests tat shows ping spikes and look at the sirq percentage of the pi? For testing you could run a dslreports speedtest, or even the fast.com speedtest (that also allows to configure upload testing as well as longer run-times) while visually monitoring sirq.
Or you could try the speedtest package under https://forum.openwrt.org/t/speedtest-new-package-to-measure-network-performance/24647 as that will also look at the CPU load during the test.

That points to a bug or overload somewhere, thanks to your measurements we have proof of the overhead being 40 bytes, and encapsulation being ATM so pretending this was not true will run the shaper at higher rates than you would think and especially with small packets, not accounting for ATM cell encapsulation will underestimate effective packet-size by 505 to 33% and that will make your shaper ineffective against bufferbloat if the packet size mix on your link contains to many small packets. That said, unless the "bug" is fixed I do not blame you for using the "none" option as the reason for running sqm is not being theoretically correct but being practically better.
I would just like to figure out why link-layer accounting does not seem to work for you...

CPU: 0% usr 0% sys 0% nic 99% idle 0% io 0% irq 0% sirq
always like this and cpu usage never goes past 5%, and sirq 2%