SACK Panic: Multiple TCP-based remote denial of service vulnerabilities

micmac1 · June 18, 2019, 6:47pm

Thanks for this! Had to install iptables-mod-ipopt to be able to run the iptables line on 18.02.

diizzy · June 18, 2019, 7:42pm

Which I wrote before @eduperez edited his post (indicated by post #4)

jow · June 18, 2019, 7:44pm

I am very sorry then. Didn't notice that the post was edited earlier.

diizzy · June 18, 2019, 7:49pm

No worries

eduperez · June 18, 2019, 8:40pm

My mistake, I should have prepared all the info beforehand, instead of editing the post as I gathered it. I thought this was a far worse issue than it really seems to be... must confess I assumed most of internet would be down by now.

Mushoz · June 18, 2019, 9:07pm

Do these vulnerabilities warrant a new point release for the 18.06 and 17.01 branches while 19.07 is still being prepared?

vk496 · June 19, 2019, 3:30pm

Should be....

Anyone inside your LAN could send the malicious packets to your LuCi port and turn down the entire network...

eduperez · June 19, 2019, 4:43pm

Also, many people has TCP-based services running on the router, and open to the outside world (OpenVPN, SSH, LuCI, ...).

jeff · June 19, 2019, 4:49pm

If it is the stack itself, whether you have services running or not won't matter, as routed packets go through the stack.

jeff · June 20, 2019, 2:21pm

From http://lists.infradead.org/pipermail/openwrt-devel/2019-June/017704.html

please merge any outstanding changes that should be
part of LEDE 17.07.7 and OpenWrt 18.06.3 into the
respective lede-17.01 and openwrt-18.06 branches until
Friday, the 21st of June at 09:00 UTC.

[Jo] will tag these branches then and start building
corresponding binary releases shortly after.

The v17.01.7 release will also mark the end-of-life of
the LEDE 17.01 series - we'll decommission the LEDE
17.01 related build resources and repurpose them for
producing 19.07 binaries.

lleachii · June 20, 2019, 4:54pm

FYI, I flashed a snapshot today that included the fix, and LuCI fails upon install, even when resetting to default configs.

M_T · June 20, 2019, 7:09pm

As I understand it, packets received and forwarded by a router would not affect the router. (The router doesn't acknowledge the packets.) Only the destination host (which may be a proxy) would be affected. If the destination of the packets happens to be the router, then it could be affected.

Also, as of the publication of these vulnerabilities, I don't believe there were exploits in the wild. However, as the attack complexity is low, I imagine it won't be long before these hit everyone. Thankfully, the mitigations are simple, for systems that have an alert sysadmin

I manage a site where Cisco routers are on the border. I looked to see if I could block TCP packets with a low MSS in the TCP options at the border, but I saw no way to do that. Ouch.... Heaven help the IoT devices vulnerable to this.

One of the main reasons I chose OpenWrt for my home network, rather than the vendor's s/w, was exactly this kind of situation -- agility and quick response to serious issues. I am looking forward to benefiting from the active development & maintenance of OpenWrt!

anon45274024 · June 20, 2019, 7:19pm

That is not correct I am afraid. My node had been suffering from the attack for more than 2 weeks prior to recent public disclosure with annoying reboots that left no trace in /sys/kernel/debug/crashlog and thus no source to debug from.

Firewall off | Reject | Drop does not matter since the packets reach the kernel in any case for processing - accept or drop or reject.

I do not reckon that Netflix found the bug during research but because their nodes suffered from the attacks and they had to investigate.

M_T · June 20, 2019, 10:53pm

I stand by my belief as none of the CVSS scores of the CVEs indicated positively or negatively that there were exploits in the wild. The announcement from Netflix does not address this. I am not aware of any publication about these vulnerabilities that has indicated there were active exploits. Typically, if there are exploits in the wild, this would be noted to alert sysadmins to address this immediately. Had you some proof that your mystery crashes on your device were caused by this exploit, that would be most interesting, but you seem to assert you have no evidence as to the specific cause (power glitches, memory, etc.) To counter that, my (work) site has many hosts with some open TCP ports and I am unaware of any unexplained crashes, but of course that doesn't prove anything. (On the other hand, maybe your node is a target for bad actors while my company is not.)

Good luck with your crashes. I hope you can find the cause. If you can show they were from these SACK attacks, I would appreciate a private message indicating how you were able to identify that and where the TCP connection(s) was from.

anon45274024 · June 21, 2019, 5:47am

That is of course your prerogative.

It does not address a lot of things - especially how they "identified" the issue.

CVE been lodged already on April 23 this year.

Though it is not pointed out, just a mere mention by Netflix as a potential workaround, imho the accelerant for causing the crash is if net.ipv4.tcp_mtu_probing is set to 1 whilst simultaneously net.ipv4.tcp_sack is set to 1. The former setting is not widely known and even less so deployed.
Which could account that there were no widespread reports, even now that the vulnerability is in the open, especially for the millions/billions of unpatched Android | IoT devices that are WAN facing

Another consideration to take into account is that the reboots may go unnoticed, at least on CEP devices as the user would have to notice the interruption in connectivity first and then bother to look into the logs - which would reveal nothing.

There is a support ticket that was opened when the issue started to be observed and subsequent intensive debugging efforts that were undertaken.
Whilst the kernel logs were void of the cause I had a gut feeling this being caused by an outside attack however.
Unfortunately, I would not know what to look for in the tcpdumps back then and the dumps to me appeared only as the usual uninvited traffic from nefarious actors.
Once the CVE details were known I took another look and was able to pinpoint the packets - all of which with TCP flag S only and none of which carries an actual MSS value but the MSS size calculated from the (small) package size and MTU matches the pattern described by Netflix.

The exploit (once know as such) was successfully and immediately countered by setting net.ipv4.tcp_sack to 0 whilst leaving net.ipv4.tcp_mtu_probing set to 1 - that is with the unpatched kernel.

With the patched kernel applied and both aforementioned settings set to 1 no further of such reboots are observed.

M_T · June 21, 2019, 8:19pm

It isn't clear to me what you are describing about the packets. If you could share the tcpdump trace of some of those SYN packets, it would help. Change the IPs if you want. (Indeed, if you have the full set of packets to/from the attacking IP(s), I'd really like to analyze that.) Maybe I'd understand better if I could see the bug report.

The exploit involves the other end's MSS. You can't calculate that -- they have to tell you, or else you pick what you want (as I understand it).

"TCP flag S" in tcpdump is the SYN flag for TCP packets, which specifies this is the first packet of a (attempted) TCP connection from each end. If MSS is specified, it will be specified in that initial packet from each end. Each end may specify sackOK to indicate that end supports SACK. If MSS is not specified, the RFCs say that any segment size may be used.

Here is a page showing normal usage of SACK, not an attack..
http://packetlife.net/blog/2010/jun/17/tcp-selective-acknowledgments-sack/
Notice that tcpdump of the SYN packet (from the sample packet capture on that page) shows sackOK and the MSS for both sender & receiver.

11:20:10.371775 IP 192.168.1.3.58816 > 63.116.243.97.http: Flags [S], seq 3851697578, win 5840, options [mss 1460,sackOK,TS val 1545573 ecr 0,nop,wscale 7], length 0
11:20:10.394060 IP 63.116.243.97.http > 192.168.1.3.58816: Flags [S.], seq 2747564191, ack 3851697579, win 5792, options [mss 1460,sackOK,TS val 2375917050 ecr 1545573,nop,wscale 5], length 0

These values of mss are typical.

Based on the Netflix description of mitigations, I would expect any incoming SYN packet involved in this exploit to have sackOK and to have a low MSS value. I don't quite understand why the low MSS value is critical for the integer overflow in SACK, but anything less than 500 would be suspect to me.
(I have seen multiply encapsulated traffic that had MTU down in that range, resulting in poor bandwidth.) The lower the MSS value is, the more overhead is spent to send the same data (affecting bandwidth, CPU usage, and packet buffers). So, if an attacker specifies a MSS value of 32, then the attacked system has to spend more resources to deliver its traffic with only 32 bytes of data per packet.

If the incoming SYN packet doesn't specify a particular MSS (as I understand you to say), I suspect it wouldn't be part of this exploit, but I could be wrong. It certainly wouldn't be blocked by the mitigations that block SYN packets with a specified MSS less than a certain size.

Did you turn tcp_sack back on and confirm that the crashes happened again? That would seem to confirm something related to SACK is happening. If I were in your position, I'd ask "Why is it me and not lots of other people reporting these problems?' Maybe you upset a particular group of bad guys, or maybe you're a person of high value to the bad guys. Maybe it isn't this exploit, but something else (a slightly different attack or a misconfiguration). Or maybe I'm misunderstanding what you are saying about the packets.

anon45274024 · June 22, 2019, 4:54am

That would appear to be the case if the outpouring is anything to go by.

Reading other user's posts helps (to mitigate redundancy and reduce clutter in a thread)

Humble suggestion - lose the patronizing attitude, you are not other people and neither should command them of what to do

M_T · June 24, 2019, 8:57pm

n8v8R, since you believe you suffered an attack, you may want to do the Internet community a service and report it so it can really be verified.

If you go to https://www.securityfocus.com/bid/108801/exploit
you will see they say

Linux Kernel CVE-2019-11477 Integer Overflow Vulnerability
Currently, we are not aware of any working exploits. If you feel we are in error or if you are aware of more recent information, please mail us at: vuldb@securityfocus.com.

anon45274024 · June 27, 2019, 8:28am

that is irrelevant to the existence of the kernel bug and that it has been patched.
whether an exploit exists and is actively engaged should not be a reasoning of whether to patch a system with know vulnerability or not.

it would be different if it was a 0d exploit or an exploit to a known but unpatched CVE

It does not bother me really or whether you trust that there are no exploits (being tried), like mentioned before such is everyone's prerogative.
Since you seem so certain that there is no such so exploit perhaps you prefer to leave your system unpatched or not deploy any suitable countermeasure.

This whole theme - prove that an exploit exists - seems rather superfluous.

kelpknish · June 27, 2019, 10:47am

I see that in the changelog ( https://openwrt.org/releases/18.06/changelog-18.06.3 ) for the upcoming 18.06 maintenance release that this issue is patched on 4.9 and 4.14 kernels. (search for sack) and it'll come up near bottom.

Comparing the shortlogs https://git.openwrt.org/?p=openwrt/openwrt.git;a=shortlog;h=refs/tags/v18.06.3 4.9 kernel (which my device uses) appears it has been upgraded to 4.9.182. This is great news, I can't thank the Devs enough for their hard work esp this month.