I am running into a weird issue with conntrack on my OpenWRT router. When torrenting, there's a ton of connections made, which is completely normal. However, even with just 4 torrents uploading with not that many peers per torrent (a little over 100 peers for the most busy one) the conntrack table starts to fill up. Eventually I had 16384 open connections and the kernel started dropping packets as expected. Now I can simply increase the value of nf_conntrack_max if I want to cheat, but that is not going to actually solve the issue at hand.
While torrent is a worst case scenario for NAT with the amount of connections being made, 4 torrents shouldn't let us hit the default conntrack limit of 16k. So I did some digging. I ran on my PC from which I would torrent the following command:
watch sudo cat /proc/sys/net/netfilter/nf_conntrack_count
And on my router:
watch cat /proc/sys/net/netfilter/nf_conntrack_count
I used them to keep track of the connection count. Now when I started the torrent program, it would all start normally: The number of connections would rapidly rise, with the number of connections on the router being about ~200 higher from other devices on a relatively quiet network. But after a while, a discrepancy started to appear with the router having thousands of more connections than the computer had. For some reason, old connections would properly close on the computer but not on the router.
I let this test run up to 4.7k connections on the computer, which resulted on 7.4k entries in the conntrack table of the router. I then closed the torrent program on the computer. The conntrack table on the computer would rapidly shrink, dropping below 500 within several minutes and dropping even further down as time progressed. However, the conntrack count on the router stalled at around 4.6k. Looking in the conntrack table itself, there were thousands of ESTABLISHED
connections which were never closed. They would eventually time-out after 2 hours which makes sense, as the default nf_conntrack_tcp_timeout_established
value on the router is 7440 seconds.
However, the computer's value for 'nf_conntrack_tcp_timeout_established` is set to a much larger value of 432000, yet it doesn't run into the same issue of these stale connections filling up the conntrack table.
I did some more testing, and if I wait long enough the 16k conntrack limit is again reached, with only ~5-6k connections on the computer, meaning that the reason why the limit is reached, is because of stale connections filling up the conntrack table, and not because we're actually hitting the limit legitimately.
Is this expected behavior in this setup? Why is the conntrack table on the router keeping connections in memory that my computer already cleared out?