How to optimize throughput and TCP network efficiency

jarodvip · July 29, 2018, 7:06am

I optimized the sysctl.conf file and optimized those to increase forwarding/throughput and TCP/UDP efficiency.

jarodvip · July 29, 2018, 7:07am

kernel.panic=3
kernel.core_pattern=/tmp/%e.%t.%p.%s.core

net.ipv4.conf.default.arp_ignore=1
net.ipv4.conf.all.arp_ignore=1
net.ipv4.ip_forward=1
net.ipv4.icmp_echo_ignore_broadcasts=1
net.ipv4.icmp_ignore_bogus_error_responses=1
net.ipv4.igmp_max_memberships=100
net.ipv4.tcp_fin_timeout=30
net.ipv4.tcp_syncookies=1
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_sack=1
net.ipv4.tcp_dsack=1

#net.ipv6.conf.default.forwarding=1
#net.ipv6.conf.all.forwarding=1

net.netfilter.nf_conntrack_acct=1
net.netfilter.nf_conntrack_checksum=0
net.netfilter.nf_conntrack_max=65535
net.netfilter.nf_conntrack_tcp_timeout_established=7440
net.netfilter.nf_conntrack_udp_timeout=60
net.netfilter.nf_conntrack_udp_timeout_stream=180

# disable bridge firewalling by default
#net.bridge.bridge-nf-call-arptables=0
#net.bridge.bridge-nf-call-ip6tables=0
#net.bridge.bridge-nf-call-iptables=0

net.core.rmem_default = 256960
net.core.rmem_max = 513920
net.core.wmem_default = 256960
net.core.wmem_max = 513920
net.core.netdev_max_backlog = 2000
net.core.somaxconn = 2048
net.core.optmem_max = 81920
net.ipv4.tcp_mem = 131072  262144  524288
net.ipv4.tcp_rmem = 8760  256960  4088000
net.ipv4.tcp_wmem = 8760  256960  4088000
net.ipv4.tcp_keepalive_time = 1800
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.ip_local_port_range = 1024  65000
net.ipv4.tcp_max_syn_backlog = 2048

moeller0 · July 29, 2018, 7:38pm

Question, is it really helpful to have that much more conntrack entires than NAT "slots" (I believe by default Openwrt allows 16K NAT entries)?

stangri · July 30, 2018, 12:53am

Another question, can you post the router details and results of some relevant tests before/after the change?

mindwolf · November 1, 2019, 12:52am

The tcp rmem values look wrong to my eye. NOTE: the values are in bytes and linux will double what you enter e.g. 65535 is actually 131072.

NoTengoBattery · November 1, 2019, 1:57am

Other things you can do:

elevate the CPU frequency to reduce latency (like Linksys did putting the CPU at max)
build a preemptive kernel (like Linksys did)
build an idle tickless (OpenWrt default) kernel at 500 Hz (like Linksys did but they use 1000 Hz)
enable and use BBR congestion control as default

Also: be careful about the keepalive because it can easily be used as a DDoS vector.

shm0 · November 1, 2019, 5:16am

https://www.kernel.org/doc/Documentation/networking/nf_conntrack-sysctl.txt

nf_conntrack_buckets - INTEGER
	Size of hash table. If not specified as parameter during module
	loading, the default size is calculated by dividing total memory
	by 16384 to determine the number of buckets but the hash table will
	never have fewer than 32 and limited to 16384 buckets. For systems
	with more than 4GB of memory it will be 65536 buckets.
	This sysctl is only writeable in the initial net namespace.

On Systems with a total memory between 256 MB and < 4 GB
this setting will default to the max value of 16384.

nf_conntrack_max - INTEGER
	Size of connection tracking table.  Default value is
	nf_conntrack_buckets value * 4.

16384 * 4 = 65536

But is it actually useful to have such a high limit? I don't know.
For a small network maybe not.

Config with some more aggressive timeouts:

net.netfilter.nf_conntrack_acct=1
net.netfilter.nf_conntrack_checksum = 1
net.netfilter.nf_conntrack_timestamp = 1
net.netfilter.nf_conntrack_tcp_loose = 1
net.netfilter.nf_conntrack_buckets = 16384
net.netfilter.nf_conntrack_expect_max = 64
net.netfilter.nf_conntrack_max = 65536
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_tcp_timeout_established = 86400
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 10
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 10
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 10
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 10
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 5
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 5
net.netfilter.nf_conntrack_udp_timeout = 10
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.netfilter.nf_conntrack_gre_timeout = 10
net.netfilter.nf_conntrack_gre_timeout_stream =180
net.netfilter.nf_conntrack_icmp_timeout = 10
net.netfilter.nf_conntrack_icmpv6_timeout = 10

And for the *rmem, *wmen settings, I think that depends on connection speed at max rtt?
1Gbps Connection over a worst case 100ms link would need a ~12 MByte buffer.
I hope, I calculated that correctly

And the tcp.*mem settings actually doesn't override the core.*mem settings?

moeller0 · November 1, 2019, 8:47am

Quick sidenote:

The netfilter/conntrack settings seem relevant for a router*, but the net.ipv4.tcp* paramters should only affect TCP connections terminating on the router itself, which will only affect things like OpenVPN/wireguard traffic terminating on the router, http/ftp services offered from the router or if a proxy like squid is used, for normal routing/NAT/firewalling they will have next to no effect on performance.

*) I have no opinion on the actual values though, I would guess the OpenWrt defaults might not reflect current best practice (but from 1st hand experience also do not seem to be catastrophically off). Does anybody here have numbers or a simple test procedure that could be used to benchmark different setting in regards to performance and robustness?

shm0 · November 1, 2019, 10:54am

Maybe something like this:
Let's assume 50Mbit/10Mbit WAN connection with an average RTT of 20ms
and the router itself is also hosting some services like FTP on the LAN interfaces @1Gbit@1ms

I guess it's best to calculate the worst-case/largest window size first.
And then get the smallest window size(s) needed.

For the above example, the largest window size needed is
~1250000bytes (1Gbit@10ms, assuming worst case RTT)
Makes this a multiple of 1024 (multiple of mss is better) = 1249280 bytes. (Both directions, rmem/wmem)

The smallest window size needed for the receive side (rmem) is
~125000bytes (1Gbit@1ms, fits also for 50Mbit/20ms)
Make this Multiple of 1024 = 124928 bytes.

The smallest window size needed for the transfer side (wmem) is
~24576 bytes (10 Mbit@20ms)

So:
net.ipv4.tcp_rmem = 4096 124928 1249280
net.ipv4.tcp_wmem = 4096 24576 1249280

net.core.rmem_max = 1249280
net.core.wmem_max = 1249280

For net.core.*mem_default, I'm not sure.

Hmm, do extreme large buffers increase latency/bufferbloat?

//edit
seems like net.ipv4.tcp_wmem/rmem are not the tcp receive/send windows.

Default: 87380 bytes. This value results in window of 65535 with
default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit
less for default tcp_app_win

Also the tcp.mem/rmem max vaules can't exceed net.core.rmem_max/mem_max values.
By default net.core.rmem_max/wmem_max is 163840 on my system.
And tcp.rmem/wmem max value is 4053472.
163840 maybe is not enough for a high speed internet link.

moeller0 · November 1, 2019, 11:10am

My point is that these windows will do nothing for WAN traffic, unless that WAN traffic terminates at the router. For internal networking, sure makes sense to adjust the limits, assuming window-scaling does not automatically take care of that. I would always try to find a way to measure the effect of any such changes and only change those that impove things

Again this is not relevant for handling packets that are just passed on...

That depends, dumb buffers sure increase latency under load, well managed buffers tend not to (for TCP one idea/option is to keep the OS buffers small by supplying back-pressure to the application, so the application will need its own buffering, but it will be on control over how much buffering it is willing to accept, given the relationship between buffer sizes and worst-case latencies under load)

shm0 · November 1, 2019, 11:14am

Yes, that's true, those buffers don't affect forwarded traffic because clients will set their own buffer sizes.
Hmm, actually the default window size of 16384 byte should fit most cases, increasing will only help to make the transfer speeds ramp up faster? The more important thing is the max window size.

Btw, is Windows (at least Win7) by default limited to a window size of 524288 bytes?
When I read, through some sites, seems like windows uses default window size of 65536 and allows scaling by x8. (can be increased to x14?)

moeller0 · November 1, 2019, 2:23pm

Sorry, no idea, also I have no win7 machine available any more...

user674574 · September 8, 2020, 3:35pm

With 1000/1000mbps symmetrical fiber and x86_64 router with 8GB ram, 4cores i7-6500U running basically everything through a wireguard client, would there be values i would get a performance boost from?

Thank you

EDIT: i read a post on reddit where its recommended to use
net.core.wmem_default=65536
net.core.wmem_max=16777216
net.ipv4.tcp_wmem=4096 65536 16777216

are these values in KByte? then i guess 16777216 is to much but maybe 2097152 or 1048576? What are the defaults? My RAM usage right now is 100MB so...

mindwolf · October 21, 2020, 2:10pm

I've reads MANY resources looking for the correct format to apply. Some say they are pages and some say in raw kilobytes as in the bandwidth delay product.
e.g. an IBM article say to write as
net.ipv4.tcp_wmem=4096 87380 (65535x4/3) 16777216 (16*x1024x1024 or 16MB)
another article says to set it as:
rate * #of ports * 2 (1x1024x1024x4x2) = 8388608 or 8MB

I would say start low and perform some iperf tests with the smallest buffer which gives the best of link utilization and low latency OR just try the calculations below.

RTT * Link Rate / 8 = [ bytes max ]
RTT * Link Rate / 8 / sqrt of max no. of flows = [ bytes default ]