General Discussion of SQM

mindwolf · February 2, 2020, 6:24pm

Either your ISP is very conscious of latency issues or there is fiber running to your local dmark, maybe both. It's most likely the latter. Docsis 3.1 with fq-pie takes care of the upload end, but not the download side. Who is your provider?

DaTurret · February 5, 2020, 10:31am

A very small local one using Cable DOCSIS Connection, been very happy with them. They are expanding their Fibre Coverage in my area but i really dont know if fiber is involved in my exact case... I know that the lower apartment in our two-story house got fibre, but that was another provider that did the installation.

mindwolf · February 8, 2020, 1:45pm

What country are you based out of? You may be able to login to your modem using the standard 192.168.100.1 for the a cable modem and view the docsis version in use.

mindwolf · February 16, 2020, 3:46pm

(Stereotypical Wiseguy Voice)
Picture this...
It's the wee hours of Sunday Morning.
I saw this beautiiiifull firmware that's most up to date and her name is cake
She had aqm for days if you know what I mean! hehe
I says HEY! How you doin?
She says to me "I'm ok, just looking for a nice guy to take me for a spin"
I says, "Well I might know such a guy, why don't yous' hop in my router and go for a little spin?"
She agrees, I click her link, soon enough we get to downloadin', and she's downloading pretty fast ya hear me?!

Now, she's a uploading and downloading for about an hour, like an animal.
I says "Stop all the downloadin' and show me what ya got!"
She then shows me her stats after it's all said and done, and just like that, I knew she was da one.

root@OpenWrt:~# tc -s qdisc show dev eth1.2
qdisc cake 800d: root refcnt 2 bandwidth 10Mbit besteffort dual-srchost nat wash ack-filter split-gso rtt 100.0ms noatm overhead 23
 Sent 26798900 bytes 42631 pkt (dropped 83, overlimits 59424 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 3220288b of 4Mb
 capacity estimate: 10Mbit
 min/max network layer size:           28 /    1500
 min/max overhead-adjusted size:       51 /    1523
 average network hdr offset:           14

                  Tin 0
  thresh         10Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay      272.0ms
  av_delay      226.6ms
  sp_delay          3us
  backlog            0b
  pkts            42714
  bytes        26812710
  way_inds           34
  way_miss          719
  way_cols            0
  drops               4
  marks           14344
  ack_drop           79
  sp_flows            8
  bk_flows            1
  un_flows            0
  max_len          1514
  quantum           305

qdisc ingress ffff: parent ffff:fff1 ----------------
 Sent 71227650 bytes 60026 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
root@OpenWrt:~# tc -s qdisc show dev ifb4eth1.2
qdisc cake 800e: root refcnt 2 bandwidth 45Mbit besteffort dual-dsthost nat wash ingress ack-filter split-gso rtt 100.0ms noatm overhead 23
 Sent 72049764 bytes 60002 pkt (dropped 33, overlimits 94700 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 371008b of 4Mb
 capacity estimate: 45Mbit
 min/max network layer size:           46 /    1500
 min/max overhead-adjusted size:       69 /    1523
 average network hdr offset:           14

                  Tin 0
  thresh         45Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay        234us
  av_delay         25us
  sp_delay          2us
  backlog            0b
  pkts            60035
  bytes        72068772
  way_inds           34
  way_miss          730
  way_cols            0
  drops              13
  marks             218
  ack_drop           20
  sp_flows            1
  bk_flows            1
  un_flows            0
  max_len          1514
  quantum          1373

root@OpenWrt:~# sysctl -p
net.ipv4.tcp_ecn = 1
net.ipv4.tcp_ecn_fallback = 2
net.ipv4.tcp_moderate_rcvbuf = 0
net.ipv4.tcp_window_scaling = 0
net.core.rmem_default = 65536
net.core.rmem_max = 4194304
net.ipv4.tcp_rmem = 4096 65536 4194304
net.core.wmem_default = 65536
net.core.wmem_max = 4194304
net.ipv4.tcp_wmem = 4096 65536 4194304
net.ipv4.udp_mem = 4096 65536 4194304

shm0 · February 16, 2020, 6:22pm

Why did you turn off
net.ipv4.tcp_moderate_rcvbuf
and
net.ipv4.tcp_window_scaling
?
It doesn't affect forwarded traffic but with modern internet speeds it is always better to have those options turned on.

net.ipv4.tcp_rmem = 4096 65536 4194304
net.ipv4.tcp_wmem = 4096 65536 4194304

It doesn't set the initial window size to 65536.
It is actually 65536 * 4 / 3 = ~ 87380 (the default) and when net.ipv4.tcp_adv_win_scale is set to 2.
To increase the max limits to 16mb (which is tcp max allowed window size?)
and have a default window of 65536:

net.ipv4.tcp_moderate_rcvbuf = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_adv_win_scale = 2
net.core.wmem_max = 16777216
net.core.rmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 87380 16777216

mindwolf · February 16, 2020, 6:31pm

Linux doubles whatever tcp numbers it's given in bytes to account for additional processing overhead. So a default of 65536 will be 131072. Although technically it would 65535.

There's no point in having such large buffers unless it's for udp traffic. I"m not running a server, just a router. These settings allow for much lower cwnd's and almost zero re-transmissions.

Windows scaling really helps for low bandwidths, but modern connections don't need to be scaled so drastically.

shm0 · February 16, 2020, 6:41pm

For example,
A 100Mbit line with latency of 30ms needs a window size of at least ~375000 bytes,
If this line is upgraded to 1Gbit@30ms the window size already needs to be ~3750000 bytes (3.58 Mbyte)
If the latency is very high like 100ms, the buffer size needs to be ~ 12 Mbyte.
So why not set the maximum and let the system automatically scale it when needed.

Yes, you are right, the values are doubled now.
Because of net.ipv4.tcp_adv_win_scale = 1, it was set to 2 at some point.

http://lkml.iu.edu/hypermail/linux/kernel/1205.2/01355.html

mindwolf · February 16, 2020, 6:52pm

In my area 100mbit is a dream! I do know some folks who have symmetrical 1gbit in the boonies, go figure?! Alas, I won't seeing those figures anytime soon. I have currently have a 50/10 VDSL2 no PPPOE line with bonding="works but not reliably".

I concur that a larger bandwidth would be advantageous to enable windows scaling but as I've said, I find better results limiting the buffering to a more realistic limit. It seems there is no true expectation of automatic tuning.

mindwolf · February 16, 2020, 6:55pm

« SpeedGuide.net TCP Analyzer Results »
Tested on: 2020.02.16 13:55
IP address: 172.3.xx.xxx
Client OS/browser: Windows 10 (Chrome 79.0.3945.130)

TCP options string: 020405b40103030801010402
MSS: 1460
MTU: 1500
TCP Window: 131328 (not multiple of MSS)
RWIN Scaling: 8 bits (2^8=256)
Unscaled RWIN : 513
Recommended RWINs: 64240, 128480, 256960, 513920, 1027840
BDP limit (200ms): 5253kbps (657KBytes/s)
BDP limit (500ms): 2101kbps (263KBytes/s)
MTU Discovery: ON
TTL: 50
Timestamps: OFF
SACKs: ON
IP ToS: 00000000 (0)

shm0 · February 16, 2020, 7:24pm

But you have window scaling enabled?

Window > 65536 = Scaling enabled.
131328 is sufficient for 50Mbit@20ms

//edit
Maximum window size can be 1Gbyte and not 16Mbyte. (65536 * 2^14)

mindwolf · February 16, 2020, 8:59pm

Thank you for reminding me to change that to 65535, not 65536 to the calculation

net.ipv4.tcp_ecn=1
net.ipv4.tcp_ecn_fallback=2
net.core.rmem_default=87380
net.core.rmem_max=349520
net.ipv4.tcp_rmem=4096 87380 349520
net.core.wmem_default=87380
net.core.wmem_max=349520
net.ipv4.tcp_wmem=4096 87380 349520
net.ipv4.udp_mem=4096 87380 349520
net.ipv4.tcp_moderate_rcvbuf=0
net.ipv4.tcp_window_scaling=0

dlakelan · February 16, 2020, 9:18pm

none of this affects anything other than connections from or to the router right? stuff like updating the package lists or installing packages... unless you run a squid proxy or similar

mindwolf · February 16, 2020, 9:49pm

It affects both lan and wan connections. Most folks don't bother with these because of autotuning. Try setting these too low on a 1gbit connection with high latency and watch things fall apart.

dlakelan · February 16, 2020, 10:16pm

but it doesn't affect any routed traffic, these kind of options need to be set on the endpoints

mindwolf · February 16, 2020, 10:17pm

I thought so too at first, but it does. With new settings posted above:

« SpeedGuide.net TCP Analyzer Results » 
Tested on: 2020.02.16 15:45 
IP address: 172.3.xx.xxx 
Client OS/browser: Windows 10 (Chrome 79.0.3945.130) 
 
TCP options string: 020405b40103030801010402 
MSS: 1460 
MTU: 1500 
**TCP Window: 262656 (not multiple of MSS)** 
RWIN Scaling: 8 bits (2^8=256) 
Unscaled RWIN : 1026 
Recommended RWINs: 64240, 128480, 256960, 513920, 1027840 
BDP limit (200ms): 10506kbps (1313KBytes/s)
BDP limit (500ms): 4202kbps (525KBytes/s) 
MTU Discovery: ON 
TTL: 50 
Timestamps: OFF 
SACKs: ON 
IP ToS: 00000000 (0)

moeller0 · February 16, 2020, 10:30pm

And changing back to the old ones and doing the test twice gives?
I wonder if that is partly driven by windows10's network stack autotuning?

mindwolf · February 16, 2020, 11:37pm

« SpeedGuide.net TCP Analyzer Results » 
Tested on: 2020.02.16 18:36 
IP address: 172.3.xx.xxx 
Client OS/browser: Windows 10 (Chrome 79.0.3945.130) 
 
TCP options string: 020405b40103030801010402 
MSS: 1460 
MTU: 1500 
TCP Window: 131328 (not multiple of MSS) 
RWIN Scaling: 8 bits (2^8=256) 
Unscaled RWIN : 513 
Recommended RWINs: 64240, 128480, 256960, 513920, 1027840 
BDP limit (200ms): 5253kbps (657KBytes/s)
BDP limit (500ms): 2101kbps (263KBytes/s) 
MTU Discovery: ON 
TTL: 50 
Timestamps: OFF 
SACKs: ON 
IP ToS: 00000000 (0)

mindwolf · February 17, 2020, 12:18am

a capture shows scaling is still in effect..

dlakelan · February 17, 2020, 12:47am

I'm telling you, these affect the kernel of the router which means they affect any stream that terminates on the router like when you download packages, or maybe during DNS lookups or other things your router does.

Anything you do from the LAN is affected by the behavior of the OS on the LAN machine

For example wmem and rmem options are for write and read buffers... but only the endpoint writes or reads from streams... the router just passes packets from one interface to another.

moeller0 · February 17, 2020, 10:03am

Which, as far as I can tell is a good thing, window scaling was introduced not to cater to slow short links, but to allow fully saturating long and/or fast links, it is all about allowing enough in flight data to not be limited by the TCP windows. The core idea is that of the bandwidth delay product BDP, which tells you the volume of traffic that needs to be inflight if you want t saturate a link with a given bandwidth and RTT; any thing that increases either the delay or the bandwidth also increases the BDP and hence requires larger windows. With your access link you can rule out bandwidth becoming a problem, but you can not control delay, so I wonder what improvement you expect from disabling window scaling in principle?