Torrents causing conntrack table to overflow

I am running into a weird issue with conntrack on my OpenWRT router. When torrenting, there's a ton of connections made, which is completely normal. However, even with just 4 torrents uploading with not that many peers per torrent (a little over 100 peers for the most busy one) the conntrack table starts to fill up. Eventually I had 16384 open connections and the kernel started dropping packets as expected. Now I can simply increase the value of nf_conntrack_max if I want to cheat, but that is not going to actually solve the issue at hand.

While torrent is a worst case scenario for NAT with the amount of connections being made, 4 torrents shouldn't let us hit the default conntrack limit of 16k. So I did some digging. I ran on my PC from which I would torrent the following command:

watch sudo cat /proc/sys/net/netfilter/nf_conntrack_count

And on my router:

watch cat /proc/sys/net/netfilter/nf_conntrack_count

I used them to keep track of the connection count. Now when I started the torrent program, it would all start normally: The number of connections would rapidly rise, with the number of connections on the router being about ~200 higher from other devices on a relatively quiet network. But after a while, a discrepancy started to appear with the router having thousands of more connections than the computer had. For some reason, old connections would properly close on the computer but not on the router.

I let this test run up to 4.7k connections on the computer, which resulted on 7.4k entries in the conntrack table of the router. I then closed the torrent program on the computer. The conntrack table on the computer would rapidly shrink, dropping below 500 within several minutes and dropping even further down as time progressed. However, the conntrack count on the router stalled at around 4.6k. Looking in the conntrack table itself, there were thousands of ESTABLISHED connections which were never closed. They would eventually time-out after 2 hours which makes sense, as the default nf_conntrack_tcp_timeout_established value on the router is 7440 seconds.

However, the computer's value for 'nf_conntrack_tcp_timeout_established` is set to a much larger value of 432000, yet it doesn't run into the same issue of these stale connections filling up the conntrack table.

I did some more testing, and if I wait long enough the 16k conntrack limit is again reached, with only ~5-6k connections on the computer, meaning that the reason why the limit is reached, is because of stale connections filling up the conntrack table, and not because we're actually hitting the limit legitimately.

Is this expected behavior in this setup? Why is the conntrack table on the router keeping connections in memory that my computer already cleared out?

1 Like

Are you using DHT or have UDP connections enabled/enforced for client connection?

Even though UDP is considered connectionless, conntrack will keep track of those UDP sessions all the same. Your PC might stop counting those UDP sessions once the torrent client stops listening on the UDP port, but the router has no idea since UDP by design doesn't have a RST/FIN like TCP does.

If a torrent is popular, you'll have several hundred, potentially thousands of other clients attempting to connect to you even an hour later after you've closed your torrent client down. Look at the actual conntrack table, and use grep/wc to count if it's mostly TCP/UDP.

From my router, for one specific torrent client:
root@router:/proc/13512/net# cat nf_conntrack | grep "tcp.*30522" | wc -l
117
root@router:/proc/13512/net# cat nf_conntrack | grep "udp.*30522" | wc -l
1985

1 Like
root@OpenWrt:~# cat /proc/net/nf_conntrack | grep -i tcp | wc -l
4513

and

root@OpenWrt:~# cat /proc/net/nf_conntrack | grep -i udp | wc -l
41

This is after I closed my torrent client (around 30-60 minutes ago now). So the problematic connections are definitely TCP based. I do have DHT enabled though.

DHT and UDP isn't the problem then. If you grep for your specific torrent port + TCP, does the number increase or decrease overtime when the torrent client is offline? The number should decrease until it hits zero, unless conntrack creates an entry, even when the TCP destination port was unreachable and no handshake was done.

Personally on my network with about 3 heavy torrent users, I mostly hit the conntrack limit of 16k due to one user, who has uTorrent with about several hundred torrents. It's rare however, I mostly stay below 8k. I haven't touched the default sysctl values.

Interesting...

root@OpenWrt:~# cat /proc/net/nf_conntrack | grep -i tcp | grep -i 31036 | wc -l
2

Edit: And starting qbittorrent again shows increasing numbers of port 31036 connections, so I am definitely grepping the correct port. However, there are way more connections that utilize a different port. I am not expert on the torrent protocol, but is that to be expected?

Good point, forgot about that. An inbound torrent connection will try to connect to port 31036.
But your outbound connections will connect to whatever destination port is advertised by other clients, the source port on your system will most likely be a random port then.

1 Like

Right. And since the number of connections on the 31036 port drops to 0 after a few minutes of closing the torrent client, the lingering connections seem to be outbound connections then. The question is why this is happening. Is this a conntrack bug? Is this expected behavior (why?) Or could this possibly be a qbittorrent (libtorrent) bug?

Just 15 connections with a _WAIT status, out of 4.8k connections currently open (with torrent client closed).

  • Have you altered the manner in which your firewall handles handle TCP FIN or RST packets?
  • Can you run cat /etc/sysctl.conf | grep 'tcp' on the OpenWrt and post the results?
  • What version are you running?

Maybe cat /etc/sysctl.d/1* | grep tcp ?

1 Like

No.

root@OpenWrt:~# cat /etc/sysctl.d/* | grep tcp
net.ipv4.tcp_fin_timeout=30
net.ipv4.tcp_keepalive_time=120
net.ipv4.tcp_syncookies=1
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_sack=1
net.ipv4.tcp_dsack=1
net.netfilter.nf_conntrack_tcp_timeout_established=7440

19.07.4. Flow control (both software and hardware) are disabled. While I didn't have enough time to run it as long as I did qbittorrent, transmission didn't seem to trigger the same behavior. But then again, Transmission was also using fewer connections to begin with. I will have to run a longer test tonight to double check. I am not sure if that means that this is a bug in qbittorrent/libtorrent though, since conntrack on the desktop PC from which the client was running wasn't running into the same issues.

Fwiw, in Gargoyle we include a patch to set the TCP timeout to 600 (10 minutes) for these very situations.
2 hours seems excessive?

1 Like

The kernel default is even higher at 5 DAYS. From what I understand, under normal circumstances connections shouldn't stay at ESTABLISHED for very long anyway, but somehow under this workload they do.

Just a quick google, perhaps this might help you


Seems like Linux doesn't handle torrent traffic very well.

3 Likes

Do you have any packet loss or errors on the interface(s)?

I'm not experiencing these issues - multiple clients running Transmission in LAN, some run weeks.

VERY interesting link! It specifically mentions:

On many default configurations, when using iptables with connection tracking (conntrack) set to drop "INVALID" packets, sometimes a great deal of legitimate torrent traffic (especially DHT traffic) is dropped as "invalid." This is typically caused by either conntrack's memory restrictions, or from long periods between packets among peers.

And I have the option to drop INVALID traffic set to ENABLED in Luci. I am going to disable that option, and see if that fixes the issue. Thank you very much for the pointer!

Unfortunately, this issue is not fixed :frowning: 12k connections and rising at my router while only 4k at my computer.

Nope.

What OpenWRT version are you running?

Currently, OpenWrt SNAPSHOT r13926-f94b09867d.

I haven't had torrent issues in years on hardware...and any issues I had are nonexistence since version 17.

If it helps - Pre version 17, I had a Titan Wireless TW-533-4 - I was about to mount this device on a high tower...and while bench testing, a friend managed to crash it by simply having Limewire running on their laptop. Version 18 fixed all issues like that.

@Mushoz , maybe you have been carrying over old values when you upgrade to newer versions? Have you unchecked the "Keep settings" for major version upgrades? (like from 17 to 18 to 19)
Did you try to reset to defaults and give it a try?

Yes, I always reset to defaults after every major release. I did find out something interesting. 45 minutes after closing my torrent client, there are still 9.8k connections open. The vast majority of those all have the same status:

root@OpenWrt:~# cat /proc/net/nf_conntrack | grep -i assured | grep -i established | wc -l
9714

Does this fact help in any way? I am not entirely sure what "ASSURED" means in this context.