How to increase the kernel TCP/UDP hash table entries parameter

xiaobo · January 22, 2019, 10:57am

NET: Registered protocol family 2
TCP established hash table entries: 32768 (order: 6, 262144 bytes)
TCP bind hash table entries: 32768 (order: 7, 524288 bytes)
TCP: Hash tables configured (established 32768 bind 32768)
UDP hash table entries: 2048 (order: 4, 65536 bytes)
UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes)

jeff · January 22, 2019, 2:23pm

Why are "info" log messages of concern?

Not many applications of OpenWrt to consumer-grade hardware should need to handle more than 32k simultaneous connections.

If you've really got a problem, I'd probably buy different hardware (only half kidding). You should be able to be adjusted through the usual sysctl (s) at run time. Things like net.core.somaxconn, and those related to TCP backlog are perhaps the ones to start with. net.netfilter may also need adjustment.

xiaobo · January 23, 2019, 4:22am

@jeff I have got an Intel Xeon E3-12xx v2 (Ivy Bridge) hardware and 3898904 kB / 3950640 kB (98%) memory
[ 0.095980] smpboot: CPU0: Intel Xeon E3-12xx v2 (Ivy Bridge)
[ 0.100000] Performance Events: unsupported p6 CPU model 58 no PMU driver, software events only.
[ 0.100114] Hierarchical SRCU implementation.
[ 0.104421] smp: Bringing up secondary CPUs ...
[ 0.108760] x86: Booting SMP configuration:

jeff · January 23, 2019, 5:10am

I've got an E3-1265Lv2 in a Lanner box myself so I'm very familiar with the processor and its capabilities. I've got to question the wisdom of running a 4-core, 8-thread CPU with an extended instruction set under OpenWrt, an OS optimized for resource-constrained, consumer-grade, SoC-based, all-in-one wireless routers, in what, if you've got over 32k established TCP connections, is likely an enterprise or commercial environment.

It's also still not clear why the hash-table sizes indicated are inadequate.

That aside, Linux kernel is Linux kernel, and the standard kernel sysctls apply.

xiaobo · January 23, 2019, 9:50am

@jeff In the business environment, Inexpensive MikroTik products for small businesses，Huawei NetEngine 5000E or Cisco ASR 9000 suitable for large business are good choices. Because openwrt is free software, it is suitable for non-profit contribution projects to build NTP server at home, Is not the business environment, join ntppool serve (https://www.ntppool.org) is often more than 65k established connections.

# Kernel parameters can be modified after boot, 
# boot process network parameters do not know how to modify? GRUB parameters?

# cat /etc/sysctl.conf
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_wmem = 4096 87380 4194304
net.ipv4.tcp_mem = 4194304 4194304 4194304
net.ipv4.udp_mem = 2097152 2097152 2097152
net.netfilter.nf_conntrack_max = 264192
net.netfilter.nf_conntrack_buckets = 264192

jeff · January 23, 2019, 1:46pm

That makes some sense then, especially as NTP is UDP. Might want to drop the persistence time for conntrack as NTP is effectively a single-packet protocol that should take close to no time to complete.Likely net.netfilter.nf_conntrack_udp_timeout. man udp describes the net.ipv4.udp_* sysctls.

If that can't get the conntrack system to behave properly, and you still need it (after all, if the server is only providing NTP, then static firewall rules around UDP ports 53 and 123 are probably sufficient), https://wiki.khnet.info/index.php/Conntrack_tuning suggests that the hash table size can be modified through net.netfilter.nf_conntrack_buckets

rtau · January 23, 2019, 2:49pm

If conntrack is a problem for ntp, would it be possible to just bypass conntrack for ntp traffic?

I imagine it would be a single iptables rule at the top of the input and output chain.

xiaobo · January 28, 2019, 12:05pm

Do not know why pppoe link number 471832 will automatically interrupt?
net_ratelimit: 102 callbacks suppressed ？

#  cat /proc/sys/kernel/printk_ratelimit
5
# cat /proc/sys/kernel/printk_ratelimit_burst
10
# cat /proc/sys/net/core/message_cost
5
# cat /proc/sys/net/core/message_burst
10
-----
 ICMPv6: process `sysctl' is using deprecated sysctl (syscall) net.ipv6.neigh.br-lan.base_reachable_time - use net.ipv6.neigh.br-lan.base_reachable_time_ms instead
 nr_pdflush_threads exported in /proc is scheduled for removal
[1442]: Got DHCPv6 request
442]: DHCPV6 RENEW IA_NA from 0001000122c5a049b827eb1488d2 on br-lan: no binding
2136.560855] net_ratelimit: 73 callbacks suppressed
2141.780868] net_ratelimit: 102 callbacks suppressed
2142.661281] nf_conntrack: nf_conntrack: table full, dropping packet
6]: LCP terminated by peer
6]: Connect time 536.0 minutes.
6]: Sent 194330912 bytes, received 1619926275 bytes.
036]: Failed to send RS (Permission denied)
: Network device 'pppoe-wan' link is down
: Network alias 'pppoe-wan' link is down
: Interface 'wan_6' has link connectivity loss
: Interface 'wan' has lost the connection
: Interface 'wan_6' is disabled
606]: Modem hangup
606]: Connection terminated.
6]: Connect time 536.0 minutes.
6]: Sent 194330912 bytes, received 1619926275 bytes.
6]: Sent PADT
6]: Exit.
036]: Failed to send DHCPV6 message to ff02::1:2 (Permission denied)
: Interface 'wan' is now down!

moeller0 · January 28, 2019, 12:22pm

I guess the issue is NAT, you effectively use the 16bit port number to multiplex both internal hostIP (only transiently) and the port number for that host to use, so unless you avoid masquerading you are limited to at most 64k connections (and due to the mentioned timeouts especially for UDP this limit does limit the number of concurrent connections not immediately but sort of averages over the tim-out period as UDP has no way (and not even TCP is guaranteed) to gracefully tear down a connection that conntrack could intercept to clear a connection from its table...)
I have a feeling you know all of this already, and I might maybe have misunderstood the actual problem, so forgive me if this is less than helpful...

xiaobo · January 28, 2019, 1:02pm

@moeller0 thank you, because nf_conntrack: table full after I expanded to 264192 x 10 = 2641920 problem solved, but then the kernel log problems net_ratelimit: 102 callbacks suppressed?

This rate limit is also a mechanism used by Linux to avoid DoS attacks, where every message is logged (causing the storage space to explode). When the kernel logs a message, it USES printk() to check if the log is printed.

This limit can be tuned with /proc/sys/kernel/printk_ratelimit and /proc/sys/kernel/printk_ratelimit_burst. The default configurations are 5 and 10, respectively. That is, the kernel allows 10 messages to be logged every 5 seconds. Exceeds this limit, the kernel will abandon the log, and record ratelimit N: callbacks suppressed.

If want to turn off the ratelimit mechanism, which allows every message to be logged, you can set the message_cost value to 0. However, once you turn off ratelimit, there is a risk that the system will be attacked by the log.

xiaobo · January 30, 2019, 1:36am

pppoe has lost the connection

find pppoe setting issue

github.com/openwrt/luci

LCP echo failure threshold 0 does not behave as described in LuCI

opened 07:08PM - 25 Aug 18 UTC

bill888uk

For BT Home Hub 5A configured to use PPPoE or PPPoA protocol for example. In …**LuCI→Network→Interfaces→WAN→Advanced Settings,** there are two default settings displayed: LCP echo failure threshold 0 LCP echo interval 5 On the page, the 'default' setting **LCP echo failure threshold** to Zero, implies all LCP echo failures would be ignored. (see attached image) But I found this to be incorrect/misleading. There is no **keepalive** option found in the WAN interface section of /etc/config/network on Home Hub 5A when selecting PPPoE or PPPOA protocols. But there appears to be a fixed timeout (possibly 5 seconds) on the Home Hub 5A when I use PPPoE on VDSL connection. ie. not infinite as quoted in LuCI. I don't know if this is the same behaviour for all devices. If **LCP echo failure threshold** is set to a non-Zero value, then **keepalive** option is CREATED in the WAN interface within /e/c/network file. The description displayed in LuCI for LEDE and OpenWRT 18: 'Presume peer to be dead after a given amount of LCP echo failures, use 0 to ignore failures' perhaps needs to be changed? eg. perhaps simplest solution is to remove the phrase **'use 0 to ignore failures'** ? This closed ticket may be of interest (I added my observations affecting HH5A running LEDE 17 to end of the closed ticket) https://bugs.openwrt.org/index.php?do=details&task_id=1259 Would it be possible to add 'keepalive_adaptive' setting into LuCI mentioned in above ticket in the future? Another ticket on same subject https://bugs.openwrt.org/index.php?do=details&task_id=854 ![0lcpechofailure](https://user-images.githubusercontent.com/26403991/44621333-7ec4ee80-a89c-11e8-88bd-829fa473eaaf.jpg)

github.com/lede-project/source

ppp: remove hardcoded lcp-echo-failure, lcp-echo-interval values

committed 01:19PM - 30 Aug 18 UTC

jow-

+2 -2

OpenWrt used to ship hardcoded defaults for lcp-echo-failure and lcp-echo-interv…al in the non-uci /etc/ppp/options file. These values break uci support for *disabling* LCP echos through the use of "option keepalive 0" as either omitting the keepalive option or setting it to 0 will result in no lcp-echo-* flags getting passed to the pppd cmdline, causing the pppd process to revert to the defaults in /etc/ppp/options. Address this issue by letting the uci "keepalive" option default to the former hardcoded values "5, 1" and by removing the fixed lcp-echo-failure and lcp-echo-interval settings from the /etc/ppp/options files. Ref: https://github.com/openwrt/luci/issues/2112 Ref: https://dev.archive.openwrt.org/ticket/2373.html Ref: https://bugs.openwrt.org/index.php?do=details&task_id=854 Ref: https://bugs.openwrt.org/index.php?do=details&task_id=1259 Signed-off-by: Jo-Philipp Wich <jo@mein.io>

xiaobo · January 30, 2019, 11:24am

Why is pppoe automatically interrupted when the number of connections reaches 1202124?