SQM tweaks and best latency for connections over 650Mb

After a small discussion with @anon50098793 on Reddit (https://www.reddit.com/r/openwrt/comments/rdm1lh/optimize_sqm_settings), I wanted to discuss how can we achieve the best latency possible on connections over 650Mb+?

for some speed is more important and some latency is more important. and the goal here is to find a nice combination between these two.

So to narrow it down, this discussion is targeting people who are using OpenWrt on Raspberry Pi 4 and are using @anon50098793 community built or a stable build. and for sure using SQM with Cake. right now the best throughput/latency median is around "~400Mbit/0 latency". but as many know already that this is mostly done by just one core out of 4. so we are here to discuss tweaks, trails, or testing settings or performance tweaks to get the best throughput with 0 latency.

irqbalance, overclocking, packet steering, fq_codel.. you name it! what are the best settings you use to get the best out of SQM with cake?

1 Like

I think packet steering works well with the pi's four cores.

That or manually distributing the relevant IRQs over the CPU, I would probably try with irqbalance first, might not be as optimal as a finetuned manual assignment, but at the same time less sensitive to changes in the preconditions (like something else starting to busy-wait on a core manually assigned to cake (if that be a thing)).

HTB/fq_codel can be a bit cheaper on the CPU and allow to manually change the burst size which, while adding a bit of latency under load, can help with throughput as it relaxes SQM's pretty tight latency deadlines to be scheduled again.

2 Likes

probably a good idea to get a better (imperfect) baseline look... best run these commands while the network is quiet... (or nobody else home)

echo -n 0 > /proc/sys/net/ipv4/ip_forward
echo -n 0 > /proc/sys/net/ipv6/conf/eth1/forwarding

/etc/init.d/sqm stop

mtr -ezb4r 8.8.8.8
speedtest-ookla

/etc/init.d/sqm start

echo -n 1 > /proc/sys/net/ipv4/ip_forward
echo -n 1 > /proc/sys/net/ipv6/conf/eth1/forwarding
my humble docsis 5x/15ish for reference... my isps latency is very good and its 12pm so low contention on the local loop
##################    Speedtest by Ookla
     Server: Vocus Communications - Adelaide (id = 18247)
        ISP: Aussie Broadband
    Latency:    31.98 ms   (1.82 ms jitter)
   Download:    54.38 Mbps (data used: 56.4 MB)                               
     Upload:    18.85 Mbps (data used: 9.2 MB)

######################## MTR
dca632 ../_rpi-perftweaks.sh_202105 53°# mtr -ezb4r 8.8.8.8 
Start: 2021-12-13T23:59:49+1100
HOST: rpi-dca6325631              Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS4764   loop180150640.bng1.  0.0%    10    9.2  10.8   8.4  28.0   6.1
  2. AS4764   HundredGigE0-0-0-8.  0.0%    10    8.4   8.9   8.4   9.9   0.5
  3. AS4764   be2.core2.nextdc-s1  0.0%    10    8.4   8.9   8.2   9.9   0.5
  4. AS4764   119-18-32-167.cust.  0.0%    10    8.9   8.6   8.0   9.2   0.3
  5. AS15169  72.14.237.251 (72.1  0.0%    10    9.3   9.3   8.9   9.9   0.3
  6. AS15169  142.250.212.135 (14  0.0%    10    9.1  10.5   9.0  17.6   2.6
  7. AS15169  dns.google (8.8.8.8  0.0%    10    8.5   8.4   7.8   9.1   0.4
1 Like

@anon50098793 here are my results :slight_smile:

mtr -ezb4r 8.8.8.8
Start: 2021-12-13T16:03:03+0000
HOST: rpi4-router                 Loss%   Snt   Last   Avg  Best  Wrst StDev
@Not a TXT record
  1. AS???    ???                 100.0    10    0.0   0.0   0.0   0.0   0.0
  2. AS3209   de-dus01a-cr12-eth-  0.0%    10   30.3  12.2   7.8  30.3   6.5
  3. AS6830   de-fra04d-rc1-ae-19  0.0%    10   19.5  18.4  13.4  35.3   6.2
  4. AS6830   84.116.190.94 (84.1 20.0%    10   14.6  15.0  13.6  16.3   1.1
  5. AS15169  74.125.48.122 (74.1  0.0%    10   22.3  27.1  16.0  38.8   7.5
  6. AS15169  142.251.65.73 (142.  0.0%    10   20.1  17.9  15.5  20.9   1.6
  7. AS15169  172.253.64.119 (172  0.0%    10   16.2  18.3  16.2  26.1   2.9
  8. AS15169  dns.google (8.8.8.8  0.0%    10   18.6  17.2  13.9  23.0   3.2
speedtest-ookla

   Speedtest by Ookla

     Server: CoreRoute - Dusseldorf (id = 28439)
        ISP: Vodafone Germany Cable
    Latency:    13.11 ms   (8.46 ms jitter)
   Download:   938.37 Mbps (data used: 1.1 GB)
     Upload:    29.43 Mbps (data used: 28.2 MB)
Packet Loss:     0.0%
1 Like

ouch!.... just realized something that had not occurred to me earlier... (was just ignorantly thinking cable=fibre)...

was thinking... "i will just bump up my cable to 1G for a month and pay the extra" then I realized... in AU (and probably everywhere else) over a certain bandwidth (yes... docsis local loop shared medium and all of that but i'm trying to simplify).... we get no guarantee on speed...

... and... i'm thinking we might need / benefit / require something like @Lynx / @moeller0 et. al. auto rate but with inverted logic if we are gonna have any chance of reliably working at those speeds with any posiitive effects (as opposed to just setting an arbitralily low ~450Mb/s ish sqm roof as I think you also mentioned...

what would be good... is if you run that mtr command a few more times at several times of the day (dont worry about the rest... so long as not too many people on your network are using it).... to get a better picture of congestion on the local loop... (while we wait for the sqm/docsis experts to advise)


based on the above... may need something like

option upload '26500'
1 Like

You rarely do for the simple reason that the sum of the access rates sold to customers in a segment is mostly >> than the segments aggregate rate capacity. But then again, Germany just introduces an official speedtest and rules that if the test's results (measured over a campaign of ~ a week) indicate substantially less rate than contracted, users will get the right to either immediately cancel the contract or to lower their monthly bill proportional to the amount of under-performance (this is fresh of the books, the modified testing tool literally was released today, so details are still a bit unclear, but it hopefully sets precedence.) Not a strict rate guarantee but a way to make ISPs at least put realistic numbers in their contracts. (This is based on EU law so something similar might also pop-up in other EU countries, not that that helps you).

What do you mean by inverted? Current testing branch allow you to set one baseline shaper rate that the system will converge on if there is no load or delay, which might already be sufficient for your case. With load it will ramp up and with delay it will ramp down from this baseline value, so that is not a minimum rate.

1 Like

I can confirm that 1G cable is the worst 1G speed you can get. after seeing the mtr latency numbers I was just thinking about going back to 250Mbit traif as it was more stable than 1G. I think (just think not sure )my ISP Vodafone Germany gives you guaranteed speed at the lower speed traif but the higher ones are mostly likey shared. anyway, I will mtr at different time. becoming sure that it's my ISP now :dizzy_face:

1 Like

gotta love germany when it comes to tech... in AU... they just imposed a rule where ISP aren't allowed to tell customers a set rate LOL... intermittent mini dropouts are the killer tho... practically no SLA on those... so you could have a connection that cuts out 15 times a day for 3 weeks... and if it's good when they test it... there is no problem!

roger that... and i'll update my sources and re-test things here a bit deeper re:latency etc. cheers... (definitely advise op to grab lynx s testing github? script and give it a go directly - once a nice baseline sqm settings is set)

Nah, the law applies tp them as well, look at the Produktinformationsblatt (PIB) what they promise and read up this german article how to run an official speedtest measurement campaign.... But the truth of the matter is that a DOCSIS segment with 100s of users only has ~2Gbps of download capacity, it is clear that 1 Gbps all of the time will not be feasible.

But let's see what you got first. Please post the output of the following commands executed on your router:
tc -s qdisc
cat /etc/init.d/sqm
ifstatus wan

2 Likes

Nah that is the exception, otherwise it is not bad, but we are no fore-runners (and much denser populated than AU so much easier to wire up).

1 Like

Thank you, interesting to know. so here are the outputs

tc -s qdisc
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc mq 0: dev eth0 root
 Sent 162426739222 bytes 120007311 pkt (dropped 57715, overlimits 0 requeues 1563216)
 backlog 0b 0p requeues 1563216
qdisc fq_codel 0: dev eth0 parent :5 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
 Sent 41164373352 bytes 32143364 pkt (dropped 1450, overlimits 0 requeues 482977)
 backlog 0b 0p requeues 482977
  maxpacket 66616 drop_overlimit 0 new_flow_count 405177 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
 Sent 33496967674 bytes 23153963 pkt (dropped 16373, overlimits 0 requeues 623026)
 backlog 0b 0p requeues 623026
  maxpacket 66616 drop_overlimit 9088 new_flow_count 172129 ecn_mark 0 drop_overmemory 9088
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :3 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
 Sent 17020259 bytes 34759 pkt (dropped 0, overlimits 0 requeues 1072)
 backlog 0b 0p requeues 1072
  maxpacket 42392 drop_overlimit 0 new_flow_count 31 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
 Sent 23586698587 bytes 16955882 pkt (dropped 38327, overlimits 0 requeues 404761)
 backlog 0b 0p requeues 404761
  maxpacket 68130 drop_overlimit 25856 new_flow_count 105961 ecn_mark 0 drop_overmemory 25856
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
 Sent 64161679350 bytes 47719343 pkt (dropped 1565, overlimits 0 requeues 51380)
 backlog 0b 0p requeues 51380
  maxpacket 68130 drop_overlimit 1472 new_flow_count 670473 ecn_mark 0 drop_overmemory 1472
  new_flows_len 0 old_flows_len 0
qdisc cake 810c: dev eth1 root refcnt 2 bandwidth 45Mbit besteffort dual-srchost nat nowash ack-filter split-gso rtt 100ms noatm overhead 18 mpu 64
 Sent 23777 bytes 175 pkt (dropped 0, overlimits 105 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 13760b of 4Mb
 capacity estimate: 45Mbit
 min/max network layer size:           29 /    1278
 min/max overhead-adjusted size:       64 /    1296
 average network hdr offset:            7

                  Tin 0
  thresh         45Mbit
  target            5ms
  interval        100ms
  pk_delay        122us
  av_delay          5us
  sp_delay          1us
  backlog            0b
  pkts              175
  bytes           23777
  way_inds           10
  way_miss           62
  way_cols            0
  drops               0
  marks               0
  ack_drop            0
  sp_flows            7
  bk_flows            2
  un_flows            0
  max_len          1292
  quantum          1373

qdisc ingress ffff: dev eth1 parent ffff:fff1 ----------------
 Sent 31706 bytes 182 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 810d: dev ifb4eth1 root refcnt 2 bandwidth 450Mbit besteffort dual-dsthost nat wash ingress no-ack-filter split-gso rtt 100ms noatm overhead 18 mpu 64
 Sent 34294 bytes 182 pkt (dropped 0, overlimits 11 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 5098b of 15140Kb
 capacity estimate: 450Mbit
 min/max network layer size:           46 /    1500
 min/max overhead-adjusted size:       64 /    1518
 average network hdr offset:            7

                  Tin 0
  thresh        450Mbit
  target            5ms
  interval        100ms
  pk_delay         17us
  av_delay          1us
  sp_delay          1us
  backlog            0b
  pkts              182
  bytes           34294
  way_inds           15
  way_miss           65
  way_cols            0
  drops               0
  marks               0
  ack_drop            0
  sp_flows            7
  bk_flows            1
  un_flows            0
  max_len          3028
  quantum          1514
 cat /etc/init.d/sqm
#!/bin/sh /etc/rc.common

START=50
USE_PROCD=1

service_triggers()
{
        procd_add_reload_trigger "sqm"
}

reload_service()
{
        stop "$@"
        start "$@"
}

start_service()
{
        /usr/lib/sqm/run.sh start "$@"
}

stop_service()
{
        /usr/lib/sqm/run.sh stop "$@"
}

boot()
{
        export SQM_VERBOSITY_MIN=5 # Silence errors
        start "$@"
}
ifstatus wan
{
        "up": true,
        "pending": false,
        "available": true,
        "autostart": true,
        "dynamic": false,
        "uptime": 53356,
        "l3_device": "eth1",
        "proto": "dhcp",
        "device": "eth1",
        "metric": 0,
        "dns_metric": 0,
        "delegation": true,
        "ipv4-address": [
                {
                        "address": "<masked>",
                        "mask": 21
                }
        ],
        "ipv6-address": [

        ],
        "ipv6-prefix": [

        ],
        "ipv6-prefix-assignment": [

        ],
        "route": [
                {
                        "target": "0.0.0.0",
                        "mask": 0,
                        "nexthop": "<masked>",
                        "source": "<masked>/32"
                }
        ],
        "dns-server": [
                "9.9.9.9"
        ],
        "dns-search": [

        ],
        "neighbors": [

        ],
        "inactive": {
                "ipv4-address": [

                ],
                "ipv6-address": [

                ],
                "route": [

                ],
                "dns-server": [
                        "80.69.96.12",
                        "81.210.129.4"
                ],
                "dns-search": [
                        "upc.de"
                ],
                "neighbors": [

                ]
        },
        "data": {
                "hostname": "f88b374afbdf",
                "leasetime": 7200
     

Ouch, muscle memory kicked in and I typed something I did not really want to see. Could you accept my apology and post the output of:
cat /etc/config/sqm
instead?

1 Like

no worries :slight_smile:

config queue 'eth1'
        option verbosity '5'
        option interface 'eth1'
        option debug_logging '1'
        option ingress_ecn 'ECN'
        option squash_ingress '1'
        option qdisc_really_really_advanced '1'
        option qdisc_advanced '1'
        option egress_ecn 'NOECN'
        option squash_dscp '1'
        option iqdisc_opts 'docsis besteffort ingress nat dual-dsthost'
        option eqdisc_opts 'docsis nat ack-filter dual-srchost'
        option linklayer 'none'
        option enabled '1'
        option script 'piece_of_cake.qos'
        option qdisc 'cake'
        option download '450000'
        option upload '45000'
1 Like

This looks pretty tricked out as far as configuring sqm/cake is concerned. I guess the next step to increase the shaper rate while keeping low-lntency probably should be to test whether enabling packet steering will allow higher rates with acceptable bufferbloat.

1 Like

i'd start with something like;

        option download '550000'
        option upload '26500'

/etc/init.d/sqm restart

1 Like

I have packet steering enabled via luci, not CLI but I guess it works. for sharper rate, just pass me the commands and I will give it a try

sure I can try that

1 Like

luci/uci will get you all f (all 4 cores)... i found that problematic for latency... ymmv

my build on boot will set you up with (if you disable the uci setting);

echo -n 1 > /sys/class/net/eth0/queues/tx-0/xps_cpus
echo -n 2 > /sys/class/net/eth0/queues/tx-1/xps_cpus
echo -n 4 > /sys/class/net/eth0/queues/tx-2/xps_cpus
echo -n 4 > /sys/class/net/eth0/queues/tx-3/xps_cpus
echo -n 2 > /sys/class/net/eth0/queues/tx-4/xps_cpus
echo -n 7 > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo -n 7 > /sys/class/net/eth1/queues/rx-0/rps_cpus

have also use a few others in the past... one is maybe

echo -n 7 > /sys/class/net/eth0/queues/tx-0/xps_cpus
echo -n 7 > /sys/class/net/eth0/queues/tx-1/xps_cpus
echo -n 7 > /sys/class/net/eth0/queues/tx-2/xps_cpus
echo -n 7 > /sys/class/net/eth0/queues/tx-3/xps_cpus
echo -n 7 > /sys/class/net/eth0/queues/tx-4/xps_cpus
echo -n c > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo -n c > /sys/class/net/eth1/queues/rx-0/rps_cpus

ymmv... still looking for my better notes... (essentially I move nlbwmon, luci-statistics and a few other bursty/non-essential tasks to core 4 (3 if counting from zero) then try to avoid the 4th core for networking stuff...

i'd start with something like;

        option download '550000'
        option upload '26500'

/etc/init.d/sqm restart
still +12ms of extra latency

1 Like

just to be clear, I should disable packet steering from Luci and try running:

echo -n 4 > /sys/class/net/eth0/queues/tx-0/xps_cpus
echo -n 4 > /sys/class/net/eth0/queues/tx-1/xps_cpus
echo -n 4 > /sys/class/net/eth0/queues/tx-2/xps_cpus
echo -n 4 > /sys/class/net/eth0/queues/tx-3/xps_cpus
echo -n 4 > /sys/class/net/eth0/queues/tx-4/xps_cpus
echo -n c > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo -n c > /sys/class/net/eth1/queues/rx-0/rps_cpus

from the command line, right?