OpenWrt SQM - Bufferbloat working or not working?

At times of high upload throughput on my home connection I face the bufferbloat problem.

Bellow test shows proof of this:

xx@xx:~# speedtest-netperf.sh -H netperf-eu.bufferbloat.net
2021-09-02 20:41:46 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-eu.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are sequential, each with 5 simultaneous streams.
............................................................
 Download:  53.41 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  13.113
    10pct:  13.504
   Median:  16.357
      Avg:  16.617
    90pct:  19.323
      Max:  25.374
 CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 57 samples]
     cpu0:  30.7 +/-  2.0  @ 1459 MHz
     cpu1:  22.8 +/-  2.8  @ 1025 MHz
 Overhead: [in % used of total CPU available]
  netperf:  14.7
.............................................................
   Upload:  18.50 Mbps
  Latency: [in msec, 62 pings, 0.00% packet loss]
      Min:  12.746
    10pct: 165.244
   Median: 210.922
      Avg: 200.306
    90pct: 233.306
      Max: 239.517
 CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 58 samples]
     cpu0:  17.0 +/-  4.6  @ 1193 MHz
     cpu1:  12.1 +/-  2.3  @  857 MHz
 Overhead: [in % used of total CPU available]
  netperf:   1.7

When enabling SQM on the PPPOE-wan interface setting the DSL recommended parameters on the OpenWrt website and that appears to do the trick:

xx@xx:~# speedtest-netperf.sh -H netperf-eu.bufferbloat.net
2021-09-02 20:57:10 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-eu.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are sequential, each with 5 simultaneous streams.
............................................................
 Download:  52.74 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  13.781
    10pct:  14.631
   Median:  15.818
      Avg:  15.836
    90pct:  17.142
      Max:  19.760
 CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 57 samples]
     cpu0:  34.2 +/-  1.9  @ 1712 MHz
     cpu1:  27.1 +/-  3.3  @ 1712 MHz
 Overhead: [in % used of total CPU available]
  netperf:  17.5
.............................................................
   Upload:  17.52 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  13.753
    10pct:  14.310
   Median:  15.237
      Avg:  15.882
    90pct:  17.165
      Max:  29.148
 CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 57 samples]
     cpu0:  18.5 +/-  2.5  @ 1209 MHz
     cpu1:  14.0 +/-  3.1  @  991 MHz
 Overhead: [in % used of total CPU available]
  netperf:   2.1

However, whenever I do a test from a PC within that network either using:

.. then I always get terrible results as you can see. So clearly it's NOT working. I don't get it, why is it working on the router, but not within the netwerk? I'm very confused

Below some of my params:

~#  cat /etc/config/sqm

config queue
        option interface 'pppoe-wan'
        option debug_logging '1'
        option verbosity '5'
        option qdisc 'cake'
        option script 'piece_of_cake.qos'
        option linklayer 'ethernet'
        option overhead '44'
        option download '57000'
        option upload '19000'
        option enabled '1'

~# ifstatus wan
{
        "up": true,
        "pending": false,
        "available": true,
        "autostart": true,
        "dynamic": false,
        "uptime": 76697,
        "l3_device": "pppoe-wan",
        "proto": "pppoe",
        "device": "eth0.2",
        "updated": [
                "addresses",
                "routes"
        ],
        "metric": 0,
        "dns_metric": 0,
        "delegation": false,
        "ipv4-address": [
                {
                        "address": "redacted",
                        "mask": 32,
                        "ptpaddress": "redacted"
                }
        ],
        "ipv6-address": [

        ],
        "ipv6-prefix": [

        ],
        "ipv6-prefix-assignment": [

        ],
        "route": [
                {
                        "target": "0.0.0.0",
                        "mask": 0,
                        "nexthop": "redacted",
                        "source": "0.0.0.0/0"
                }
        ],
        "dns-server": [
                "redacted",
                "redacted"
        ],
        "dns-search": [

        ],
        "neighbors": [

        ],
        "inactive": {
                "ipv4-address": [

                ],
                "ipv6-address": [

                ],
                "route": [

                ],
                "dns-server": [

                ],
                "dns-search": [

                ],
                "neighbors": [

                ]
        },
        "data": {

        }
}

~# tc -s qdisc
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
 Sent 1358237647 bytes 2292327 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
 Sent 358004825 bytes 1676722 pkt (dropped 0, overlimits 0 requeues 4)
 backlog 0b 0p requeues 4
  maxpacket 2488 drop_overlimit 0 new_flow_count 1335 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-guest root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth1.100 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev br-iot root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth1.70 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev br-lan root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth1.1 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev br-tnot root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth1.50 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev br-unot root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth1.60 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.2 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 802d: dev pppoe-wan root refcnt 2 bandwidth 19Mbit besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100ms noatm overhead 44
 Sent 144231086 bytes 234101 pkt (dropped 509, overlimits 138332 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 48288b of 4Mb
 capacity estimate: 19Mbit
 min/max network layer size:           29 /    1492
 min/max overhead-adjusted size:       73 /    1536
 average network hdr offset:            0

                  Tin 0
  thresh         19Mbit
  target            5ms
  interval        100ms
  pk_delay         75us
  av_delay         13us
  sp_delay          6us
  backlog            0b
  pkts           234610
  bytes       144990514
  way_inds          942
  way_miss         1318
  way_cols            0
  drops             509
  marks               0
  ack_drop            0
  sp_flows            1
  bk_flows            1
  un_flows            0
  max_len          2984
  quantum           579

qdisc ingress ffff: dev pppoe-wan parent ffff:fff1 ----------------
 Sent 413853718 bytes 328831 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan1 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan1-1 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan1-2 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan1-3 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 802e: dev ifb4pppoe-wan root refcnt 2 bandwidth 57Mbit besteffort triple-isolate nonat wash no-ack-filter split-gso rtt 100ms noatm overhead 44
 Sent 413495638 bytes 328591 pkt (dropped 240, overlimits 306529 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 92352b of 4Mb
 capacity estimate: 57Mbit
 min/max network layer size:           36 /    1492
 min/max overhead-adjusted size:       80 /    1536
 average network hdr offset:            0

                  Tin 0
  thresh         57Mbit
  target            5ms
  interval        100ms
  pk_delay         29us
  av_delay         11us
  sp_delay          7us
  backlog            0b
  pkts           328831
  bytes       413853718
  way_inds          935
  way_miss         1286
  way_cols            0
  drops             240
  marks               0
  ack_drop            0
  sp_flows            1
  bk_flows            1
  un_flows            0
  max_len          1492
  quantum          1514

You might try the speedtest script test in concurrent mode as well as sequential. It's a more challenging test, and more like actual high load conditions, I believe. You might be not under the controlling threshold with your throttling speed settings. Try lowering things a bit more, to see if it cleans that up. Otherwise, there could be other issues to figure out.

I am under the controlling treshold. I started low and gradually tuned them upwards so they reached a few MBit/s below the maximum speed of the line. I used the speeds mentioned in the speedtest-netperf.sh result to verify each time and it followed the thresholds I configured.

How about in concurrent? I seem to remember having problems at a lower speed in that mode vs sequential. Concurrent is how the real world should look.

But, if you went from much lower and came up, probably makes that not an issue. There may be some other issue at play. If you set for 3/4 or 1/2 of your speed, do you get a good Waveform bloat test result?

No, I just did the test again at 3/4, 1/2, and 1/10. Always the exact same endresult on waveform (at 1/10 https://www.waveform.com/tools/bufferbloat?test-id=000d57b3-5660-4775-ac51-f94d403ad431).

It's like the SQM settings are not even there. Results are the same on waveform WITH and WITHOUT SQM. That's actually my OP question as well. I don't understand why it makes a difference when using the speedtest-netperf.sh test, but not when doing the waveform or dslspeed one.

When doing in concurrent WITH SQM (first test) and WITHOUT SQM (second test):

xx@xx:~# speedtest-netperf.sh -H netperf-eu.bufferbloat.net -c ## WITH SQM ##
2021-09-02 23:28:03 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-eu.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are concurrent, each with 5 simultaneous streams.
.............................................................
 Download:  52.39 Mbps
   Upload:  15.92 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  13.693
    10pct:  14.942
   Median:  15.911
      Avg:  15.870
    90pct:  16.563
      Max:  17.372
 CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 58 samples]
     cpu0:  52.7 +/-  0.0  @ 1709 MHz
     cpu1:  24.0 +/-  3.3  @ 1611 MHz
 Overhead: [in % used of total CPU available]
  netperf:  23.2


xx@xx:~# speedtest-netperf.sh -H netperf-eu.bufferbloat.net -c  ## WITHOUT SQM ##
2021-09-02 23:30:47 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-eu.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are concurrent, each with 5 simultaneous streams.
.............................................................
 Download:  53.33 Mbps
   Upload:  17.04 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  13.699
    10pct: 192.568
   Median: 232.477
      Avg: 221.834
    90pct: 252.431
      Max: 262.785
 CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 58 samples]
     cpu0:  28.3 +/-  3.0  @ 1529 MHz
     cpu1:  24.7 +/-  4.4  @ 1501 MHz
 Overhead: [in % used of total CPU available]
  netperf:  17.0


Huh... very odd.

So, the difference is, the speedtest script runs a test from inside your router, only going out the WAN to the remote test site. All the other tests, of course, work from an external machine out on the LAN, and pass thru the rest of your router/firewall/etc. So, that implies there's something going on for traffic on the path out, that either is or looks like bloat.

Guess knowing more about your router setup and network would be an idea. Could you describe it, and if you're doing anything unusual from a basic out of the box OpenWrt setup?

No, my network setup is very simple, there is the main router that makes the pppoe-wan connection on the WAN port and for this test the PC is directly connected to the LAN side of the router. I have VLAN's defined ofc, but for this case this is irrelevant as that port is untagged on the br-lan network. I'm running a Openwrt 21.189.60983-x snapshot version and there is nothing particular on my build.

In any case all my traffic should pass through the pppoe-wan anyway, so why does the speedtest seems to indicate that SQM works while it clearly doesn't.

even when I run a ping on the router you can see it spiking as soon as I load the traffic by doing a speedtest somewhere in the network, so I guess the test above is just wrong. But then why does the bandwith of the test follow the settings I put in the SQM ? Puzzling...

64 bytes from 8.8.8.8: seq=29 ttl=118 time=13.931 ms # running on the openwrt router
64 bytes from 8.8.8.8: seq=30 ttl=118 time=167.079 ms # now I start doing a speedtest from another PC in the network
64 bytes from 8.8.8.8: seq=31 ttl=118 time=222.270 ms
64 bytes from 8.8.8.8: seq=32 ttl=118 time=164.946 ms
64 bytes from 8.8.8.8: seq=33 ttl=118 time=186.736 ms
64 bytes from 8.8.8.8: seq=34 ttl=118 time=200.186 ms
64 bytes from 8.8.8.8: seq=35 ttl=118 time=202.441 ms
64 bytes from 8.8.8.8: seq=36 ttl=118 time=202.087 ms
64 bytes from 8.8.8.8: seq=37 ttl=118 time=207.869 ms
64 bytes from 8.8.8.8: seq=38 ttl=118 time=216.926 ms
64 bytes from 8.8.8.8: seq=39 ttl=118 time=213.908 ms
64 bytes from 8.8.8.8: seq=40 ttl=118 time=202.653 ms
64 bytes from 8.8.8.8: seq=41 ttl=118 time=229.193 ms
64 bytes from 8.8.8.8: seq=42 ttl=118 time=230.099 ms
64 bytes from 8.8.8.8: seq=43 ttl=118 time=217.559 ms
64 bytes from 8.8.8.8: seq=44 ttl=118 time=223.154 ms #Test Ends 
64 bytes from 8.8.8.8: seq=45 ttl=118 time=48.543 ms

Mmmh, what router model do you have and which ISP?

Also could you please try simple.qos/fq_codel? In case of CPU overload, cake tends to deliver the configured shaper rate, while increasing the latency, while simple.qos/fq_codel keeps latency low, but at a potentially steep rate reduction cost....

I tried all queuing disciplines and scripts, none of them appear to work. Something is happening when I select the eth0 switch, but it appears to make things worse sadly. My router is ARMv7 based and my ISP is providing VDSL2+ (hence the pppoe-wan connection).
Basically I'm ready to give up. I read every article on this forum on it and tried different methods of measuring. I can't seem to make it work like it should, hence I don't trust the solution and fear it might do more wrong than it solves.

can you please repeat that test (ping) from a second client (not the router)

(also make sure all forms of offloading are disabled)

also use ookla speedtest not netperf for a while please...

1 Like

Using speedtest:

I'm in fact running your NSS built for RT2600AC @anon50098793. I couldn't find a single option for offloading anywhere, because that was also my first guess. Perhaps hidden somewhere?

Those pings that I have posted above are from another PC in the network. I've been doing several PC's actually to confirm the same result that SQM simply does nothing for me nor does my system log contains errors (SQM on debug level).

1 Like

nss is 'special'... nss = offloading (bigtime)

search that thread for 'sqm script' and you should get some decent suggestions...

(please update here or there if you try something and it works good so I can copy you :grimacing: )

seriously tho'... if they need it to be tuned up a little to drop into /usr/lib/sqm/*.qos... I can probably do that if needed... but pretty sure kong is about 100 orders of magnitude more capable than I... so start there first...

ping me on that thread if work needs doing...

2 Likes

Okay, are you sure that your end devices are actually connected to the openwrt router? In theory your DSL-Modem, might act as its own router, and if your ed devices connect via WiFi to that router your SQM settings on the openwrt router simply will not matter as the packets will never traverse SQM's instantiated traffic shaper at all.

But

looks like this is not the case...

Well you on-router test shows SQM is working, the question is why is traffic from your LAN ports exempted.

Could you post your switch configuration please.

And also post the output of:

  1. tc -s qdisc

  2. run a speedtest from your PC, and link the results here (if you can note the amount of data used per direction)

  3. tc -s qdisc

the idea is to figure out whether the speedtest increases cake's data counter's in the expected magnitude...

No, it just shows that traffic not generated on the router is not seen/handled by SQM and hence your link gets saturated and you measure bufferbloat...

I am with @anon50098793 here, this smells like some sort of hardware offload (that might be restricted to traffic that is routed between WAN and LAN and hence does not affect router originated traffic)...

1 Like

Thanks @anon50098793 and @moeller0. I suspected offloading as well, but couldn't find any settings related to it.

In the NSS thread references it's indeed not working out of the box using the sqm-scripts package. When adding the nss.qos and fq_codel it starts working. So our previous assumption that it's something to do with the interface might be wrong, although it's very weird that the speedtest script followed the shaping when originating from the router itself.

I've marked the above answers as the solution because my test seems to confirm this and I replied in the NSS build threat with my findings. If anything useful comes out of it I'll post it here so google can find it as well :slight_smile:

1 Like

Just to confirm and close this topic forgood.

It was indeed the NSS offloading at fault. With fq_codel egress shaping worked in the end, but ingress didn't. I switched from NSS build to standard build and now it's working perfectly with cake + piece_of_cake (tested multiple, but this came out best). Obviously CPU usage is a lot higher, but after enabling packet steering well within parameters as my DSL line is not that fast anyway.

I got A+ on all bufferbloat tests now. Sweet.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.