SQM Download Speed slower than expected

moeller0 · December 6, 2018, 11:52am

Could you post the output of "tc -s qdisc" for the cake(420000) and the cake(500000) case? I want to see if/how the quantum variable changes in relation to the set speed. Like HTB/TBF' burst buffer these can help cake to keep the real interface fed during CPU shortages (but at the cost of more induced latency, so this needs to be seen as a trade-off).

Have you manually installed the most recent upstream version of sqm-scripts, before this test? If not please try... And have a look at /usr/lib/sqm/defaults.sh, you will find variables that let you influence the buffer sizing by the duration required to empty that buffer at the configured bandwidth, it would be quite interesting to see whether increasing the buffer duration might give you back the apparently lost bandwidth.

BTW, it is not the leaf qdisc component of cake or fq_codel but either cake's inbuild shaper component (for layer_cake/piece_of_cake) or HTB/TBF in simple/simplest that actually drags in the high computational demands...

shm0 · December 6, 2018, 12:50pm

Does this change work with cake or does it only work with htb?

My upload speed also seems much slower then configured.
Cake set to ~40 MBit/s. Speedtest shows ~ 30 MBit/s
Hmm

tc -s qdisc show dev ifb4eth1
qdisc cake 8008: root refcnt 2 bandwidth 420Mbit besteffort dual-dsthost nat wash ingress no-ack-filter split-gso rtt 100.0ms noatm overhead 18 mpu 64
 Sent 566707793 bytes 387600 pkt (dropped 36, overlimits 470689 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 3210304b of 15140Kb
 capacity estimate: 420Mbit
 min/max network layer size:           46 /    1460
 min/max overhead-adjusted size:       64 /    1478
 average network hdr offset:           14

                  Tin 0
  thresh        420Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay         23us
  av_delay          5us
  sp_delay          1us
  backlog            0b
  pkts           387636
  bytes       566760857
  way_inds           13
  way_miss          482
  way_cols            0
  drops              36
  marks               0
  ack_drop            0
  sp_flows            0
  bk_flows            1
  un_flows            0
  max_len         14560
  quantum          1514

 tc -s qdisc show dev ifb4eth1
qdisc cake 800b: root refcnt 2 bandwidth 500Mbit besteffort dual-dsthost nat wash ingress no-ack-filter split-gso rtt 100.0ms noatm overhead 18 mpu 64
 Sent 3748 bytes 11 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 1984b of 15140Kb
 capacity estimate: 500Mbit
 min/max network layer size:           46 /     913
 min/max overhead-adjusted size:       64 /     931
 average network hdr offset:            1

                  Tin 0
  thresh        500Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay          3us
  av_delay          0us
  sp_delay          0us
  backlog            0b
  pkts               11
  bytes            3748
  way_inds            0
  way_miss            7
  way_cols            0
  drops               0
  marks               0
  ack_drop            0
  sp_flows            1
  bk_flows            1
  un_flows            0
  max_len           927
  quantum          1514

Is it possibly that the CMTS can detect if a device can't reach full speed and then throttles down the download? Like some kind of threshold? If threshold(s) > increase the usage of download channels?

moeller0 · December 6, 2018, 1:27pm

This only works with HTB. Cake is much better at autoscaling things, but if the new buffer-sizing code in sqm-scripts solves the problem for simple.qos then it might be possible to make this also configurable in cake (and even though this will come at the cost of more latency so will never be the preferred solution).

moeller0 · December 6, 2018, 1:29pm

Well what speedtest are you using?

No idea, sorry, but my giess is that is is micro-stalls that make cake not actually use the required bandwidth...

shm0 · December 6, 2018, 1:33pm

Speedtest from my provider

bufferbloat servers (betterspeedtest.sh)
iperf3 servers from speedtest.myloc.de
All show the same behavior.

moeller0 · December 6, 2018, 3:11pm

Thanks. For discussions like this I really like the dslreports speedtest (configured and shared as described in https://forum.openwrt.org/t/sqm-qos-recommended-settings-for-the-dslreports-speedtest-bufferbloat-testing/2803 as this allows to get a better view of how a link performs. But I assume that test will not give noticeably different results (but just a more detailed view into the test results).

shm0 · December 8, 2018, 5:53pm

Hmm.
I don't know...
Upload is also weird.
Without sqm:
[SUM] 8.00-9.00 sec 4.82 MBytes 40.4 Mbits/sec
With sqm and limit set to 41984
[SUM] 7.00-8.00 sec 2.31 MBytes 19.4 Mbits/sec
With sqm and limit set to 100000
[SUM] 7.00-7.61 sec 3.00 MBytes 41.6 Mbits/sec

What is going on?

moeller0 · December 8, 2018, 5:59pm

Hard to say.

moeller0 · December 8, 2018, 6:01pm

I guess my point is, without knowing your /etc/config/sqm, the output of "tc -s qdisc" and a description of how you performed the above tests, all I could do is pure speculation (and meltdown and spectre probably reminded all of us that that might have side effects ).

shm0 · December 8, 2018, 6:16pm

For testing i used a simple config

config queue
	option debug_logging '0'
	option verbosity '5'
	option linklayer 'none'
	option interface 'eth1.20'
	option qdisc 'cake'
	option script 'piece_of_cake.qos'
	option download '450560'
	option qdisc_advanced '0'
	option upload '41984'
	option enabled '1'

IPerf 30sec Upload Test:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-30.00  sec  7.46 MBytes  2.08 Mbits/sec  173             sender
[  5]   0.00-30.00  sec  7.39 MBytes  2.07 Mbits/sec                  receiver
[  7]   0.00-30.00  sec  7.44 MBytes  2.08 Mbits/sec  166             sender
[  7]   0.00-30.00  sec  7.39 MBytes  2.07 Mbits/sec                  receiver
[  9]   0.00-30.00  sec  7.42 MBytes  2.07 Mbits/sec  171             sender
[  9]   0.00-30.00  sec  7.37 MBytes  2.06 Mbits/sec                  receiver
[ 11]   0.00-30.00  sec  7.43 MBytes  2.08 Mbits/sec  168             sender
[ 11]   0.00-30.00  sec  7.38 MBytes  2.06 Mbits/sec                  receiver
[ 13]   0.00-30.00  sec  7.39 MBytes  2.07 Mbits/sec  169             sender
[ 13]   0.00-30.00  sec  7.35 MBytes  2.05 Mbits/sec                  receiver
[ 15]   0.00-30.00  sec  7.45 MBytes  2.08 Mbits/sec  165             sender
[ 15]   0.00-30.00  sec  7.40 MBytes  2.07 Mbits/sec                  receiver
[ 17]   0.00-30.00  sec  7.46 MBytes  2.09 Mbits/sec  172             sender
[ 17]   0.00-30.00  sec  7.42 MBytes  2.07 Mbits/sec                  receiver
[ 19]   0.00-30.00  sec  7.42 MBytes  2.08 Mbits/sec  166             sender
[ 19]   0.00-30.00  sec  7.38 MBytes  2.06 Mbits/sec                  receiver
[ 21]   0.00-30.00  sec  7.43 MBytes  2.08 Mbits/sec  171             sender
[ 21]   0.00-30.00  sec  7.39 MBytes  2.07 Mbits/sec                  receiver
[ 23]   0.00-30.00  sec  7.46 MBytes  2.09 Mbits/sec  158             sender
[ 23]   0.00-30.00  sec  7.43 MBytes  2.08 Mbits/sec                  receiver
[SUM]   0.00-30.00  sec  74.4 MBytes  20.8 Mbits/sec  1679             sender
[SUM]   0.00-30.00  sec  73.9 MBytes  20.7 Mbits/sec                  receiver

tc output after test:

qdisc cake 802b: root refcnt 2 bandwidth 41984Kbit besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0
 Sent 81238203 bytes 55223 pkt (dropped 1677, overlimits 95594 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 190400b of 4Mb
 capacity estimate: 41984Kbit
 min/max network layer size:           42 /    1474
 min/max overhead-adjusted size:       42 /    1474
 average network hdr offset:           14

                  Tin 0
  thresh      41984Kbit
  target          5.0ms
  interval      100.0ms
  pk_delay        7.2ms
  av_delay        4.0ms
  sp_delay          2us
  backlog            0b
  pkts            56900
  bytes        83710101
  way_inds            0
  way_miss           22
  way_cols            0
  drops            1677
  marks               0
  ack_drop            0
  sp_flows           12
  bk_flows            1
  un_flows            0
  max_len          7370
  quantum          1281

qdisc ingress ffff: parent ffff:fff1 ----------------
 Sent 2321814 bytes 42111 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

sqm set to 100000

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-30.01  sec  13.9 MBytes  3.89 Mbits/sec   35             sender
[  5]   0.00-30.01  sec  13.7 MBytes  3.83 Mbits/sec                  receiver
[  7]   0.00-30.01  sec  14.0 MBytes  3.90 Mbits/sec   32             sender
[  7]   0.00-30.01  sec  13.8 MBytes  3.87 Mbits/sec                  receiver
[  9]   0.00-30.01  sec  13.8 MBytes  3.85 Mbits/sec   33             sender
[  9]   0.00-30.01  sec  13.6 MBytes  3.81 Mbits/sec                  receiver
[ 11]   0.00-30.01  sec  16.8 MBytes  4.68 Mbits/sec   31             sender
[ 11]   0.00-30.01  sec  16.6 MBytes  4.63 Mbits/sec                  receiver
[ 13]   0.00-30.01  sec  12.6 MBytes  3.53 Mbits/sec   33             sender
[ 13]   0.00-30.01  sec  12.5 MBytes  3.51 Mbits/sec                  receiver
[ 15]   0.00-30.01  sec  14.1 MBytes  3.93 Mbits/sec   27             sender
[ 15]   0.00-30.01  sec  14.0 MBytes  3.91 Mbits/sec                  receiver
[ 17]   0.00-30.01  sec  14.4 MBytes  4.01 Mbits/sec   29             sender
[ 17]   0.00-30.01  sec  14.2 MBytes  3.97 Mbits/sec                  receiver
[ 19]   0.00-30.01  sec  14.4 MBytes  4.03 Mbits/sec   29             sender
[ 19]   0.00-30.01  sec  14.3 MBytes  4.00 Mbits/sec                  receiver
[ 21]   0.00-30.01  sec  15.1 MBytes  4.22 Mbits/sec   34             sender
[ 21]   0.00-30.01  sec  15.0 MBytes  4.20 Mbits/sec                  receiver
[ 23]   0.00-30.01  sec  12.6 MBytes  3.51 Mbits/sec   34             sender
[ 23]   0.00-30.01  sec  12.5 MBytes  3.49 Mbits/sec                  receiver
[SUM]   0.00-30.01  sec   142 MBytes  39.6 Mbits/sec  317             sender
[SUM]   0.00-30.01  sec   140 MBytes  39.2 Mbits/sec                  receiver

iperf Done.

qdisc cake 802e: root refcnt 2 bandwidth 100Mbit besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0
 Sent 154882204 bytes 105175 pkt (dropped 10, overlimits 150386 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 179Kb of 5000000b
 capacity estimate: 100Mbit
 min/max network layer size:           42 /    1474
 min/max overhead-adjusted size:       42 /    1474
 average network hdr offset:           14

                  Tin 0
  thresh        100Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay        1.0ms
  av_delay        413us
  sp_delay          2us
  backlog            0b
  pkts           105185
  bytes       154896944
  way_inds            2
  way_miss           23
  way_cols            0
  drops              10
  marks               0
  ack_drop            0
  sp_flows            1
  bk_flows            1
  un_flows            0
  max_len          8844
  quantum          1514

qdisc ingress ffff: parent ffff:fff1 ----------------
 Sent 3072237 bytes 57369 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

Hmm i have the feeling my isp is doing some kind of QoS/AQM?
When i disable sqm and saturate the upload, pings are around ~75ms.
I would expect a much higher latency under a saturated link.
But still too high for my taste.

Do you know by chance what the max channel bandwith for docsis 3.0/QAM16 is (upstream)?

You mean perfomance wise?
I don't know how much it does effect the arm cortex a9.
I think it is not affected by meltdown only spectre v2.
But i don't think it will effect the shaping of 40Mbit/s upload....

I wanted to give the updated sqm-scripts ago with htb or hfsc.
But somehow only fq_codel and cake are showing up as available qdiscs?

//edit
Tried the old qos scripts package from openwrt.
Shows the same behavior as the sqm package.
As soon as i enable qos the bandwidth is halved.

dlakelan · December 8, 2018, 8:25pm

It was a joke, @moeller0 was saying he shouldn't speculate without data.

Your situation is almost as though the math is wrong, like something's getting multiplied by 2 in the packet size calculation...

shm0 · December 9, 2018, 1:02pm

Seems like the problem was, the cpu scaling patch, i recently added.

But seems like...
the segment here is overloaded.
Yesterday late at night i was able to set sqm to 100% sync rate.
And had nice low pings while saturating the uplink.
Today pings already start to raise when upload speeds reach ~ 30 MBit/s.
Yeah fast and good internet in germany.
Sorry we don't do that here...
I feel like going back in time

moeller0 · December 9, 2018, 1:50pm

Sorry, this one should not have crept below my quality control

Great, I get rewarded for being late, you solved the riddle yourself...

The joy of DOCSIS. Well, I take that back as this is not really cable specific, all shared-something techniques suffer from this*, the issue is segment size (measured in number of users) versus segment aggregate bandwidth. Now at least with the prospect of DOCSIS3.1 around the corner you can hope that the mandatory PIE-AQM in the modem should at least give you a tolerable worst-case bufferbloat in the egress/uplink direction...

*) The question is not "is there a shared" segment, but rather at what point in the network path the sharing starts. But DOCSIS traditionally has less favorable split than GPON...

shm0 · December 9, 2018, 2:06pm

Yeah the joy of shared mediums x)

I think it will take some time before docsis 3.1 will be available here for the wide range of users.
I read, they tested 3.1 in some cities but only had problems.

Also all upload channels are running with qam16 modulation, in their support forum someone wrote that only a small amount of connections are running with qam64 modulation.

Completely switching over to qam64 would give more bandwidth? more headroom?

Actually i don't know what they are doing.
The installation down in the cellar looks like crap. (the installation in the old house was also bad)
If i had the equipment i would fix it myself.
They use ds lite, if you were lucky you could get a native ipv4 connection.
Now they over dual stack but the ipv4 part is crippled to 1460 mtu.
If you want only ipv4, sorry we can't do that, but dual stack with ipv4 and ipv6 is no problem x)
They don't over plain modems, all routers they give out have the intel puma6 bug.
In other countries they are operating in, they offer at least a bridge/modem mode.
But sorry in germany we can't do that either.
And now vodafone want to buy that company.
When worse gets even worse x)

moeller0 · December 9, 2018, 3:28pm

I assume so, QAM16 will only transmit 4 bits per symbol, while QAM64 will use 6 bits per symbol, so a QAM64 channel will have 1.5 times the bandwidth than a QAM16 one. I have very little recent experience with docsis but I belief QAM16 to be the worst case uplink modulation, this is pretty terrible.

So I assume this is UM, are they really using full dual stack or rather ds-lite? But even in the ds-lite case (where each IPv4 packet incurs an addition 40 byte IPv6 header, I believe the idea was that the CPE-CMTS connection should use baby jumbo frames so that the MTU into the internet should still be 1500, just showing how naive my beliefs seem to be...).

Well, personally I would never want IPv4 only, IPv6 is not only the future but the transition already started so it is also the present (plus IPv6 elegantly side-steps the nasty reachability-from-the-outside issues caused by CG-NAT).

Not even "fixed" firmware releases?

Not sure, I heard great things about the DOCSIS section of Vodafone in Germany, including that they seem to allow customers with non-rented modems to choose between dual-stack, ds-lite and IPv4, and they seem to have a decent information policy towards end customers, so not all in this coming change might be as bleak as you might think...

moeller0 · December 9, 2018, 3:30pm

BTW, this seems to be not uncommon, that traffic shaping and cpu scaling does not seem to harmonize very well, it could be that the shaper is bursty enough for the CPU to scale down prematurely or maybe the scaling governors might not be looking at sirq load carefully enough....

dlakelan · December 9, 2018, 3:44pm

And for the most part NAT64 works great. I think just go with ipv6 only they'll shove the entirety of the ipv4 internet into a tiny corner of ipv6 and translate over for you... I've tested Android, Linux and windows machines on ipv6 only LAN with tayga on the router and it works well for most things. I do suspect a few games and things will suffer. For those devices you can run clat on the router, give out a few static ipv4s to the few machines that need it.

Or just use their dslite but have your router only give out ipv4 static reservations to the few devices that can't handle ipv6 only, game stations etc

shm0 · December 9, 2018, 4:23pm

Yes. They offer full dual stack now but you have to add option to your plan.
Either "Power-Upload" (which obviously is useless cause segment overloads everywhere) or "Telefon Comfort" Option.

I think the MTU is 1460 because the "main" gateway is ipv6 and the ipv4 part is handled by a different gateway. So they created a tunnel between the two?

They did after 1 year or so x)

Let's see what the time brings...

moeller0 · December 9, 2018, 5:50pm

That could be a IPv6 tunnel (as the IPv6 header takes 40 bytes) or potentially a tool that reports TCP maximum segment size (MSS) instead of MTU (the 20 byte IPv4 header and the 20 bytes TCP header are deducted from the MTU to get the MSS (I am simplifying here a bit)).

shm0 · December 14, 2018, 7:38am

Could be.

I never had DSLITE (luckily i had a native ipv4 connection in the past and now dual stack).
From forum posts (unoffical isp forum) i could infer that with dslite there are some problems...
The IP is changing alot and no port forwards are working. (cause the aftr is doing nat?)
Now the question is...
Is it possible to configure an AFTR Gateway to operate like a "normal" gateway?
So it assigns a somewhat static ipv4 address and opens up the ports ?
Then the "main" gateway has an ipv6 tunnel connection to the AFTR gateway which serves the ipv4 connection?

I replaced my isp router with a plain modem(not easy to get an eurodocsis modem in germany),
connection is much much better.
Less Errors, better latency, more download channels (32 vs 24)
I set cake to the advertised speeds (400/40 Mbit/s) which ends up in ~380/38 MBit/s (tcp/ip overhead?)
Works good, nice low pings ~20ms while the connection is saturated.
Only thing that bugs me a bit...
cake puts by default all arp traffic into the high priority tin.
On a docsis connection that can be a lot of traffic/packets that end up in the high priority tin.
I measured gigabytes over a month of arp traffic. Most of it is useless anyway.
Maybe i create an arptables rule that drops all that unneeded traffic or remove the arp to cs7 mapping in cake.