NAT leakage on TL-WR1043ND v4

andreas · February 20, 2017, 7:20pm

I ordered a TP-Link TL-WR1043ND v3 due to it being supported by LEDE/OpenWRT but -- like many others -- I received a v4. At first I ran it with the stock firmware because I was curious to see what has to offer.

Much to my surprise my uplink provider informed me that the router is emitting packets with source IPs of the local LAN. Apparently, NAT works most of the time but the router fails to NAT some packets. Quite a bummer!

So I flashed the TL-WR1043ND v4 with lede-17.01.0-rc2-r3131-42f3c1f-ar71xx-generic-tl-wr1043nd-v4-squashfs-factory.bin (which BTW works great) and connected it to a sniffer but the NAT leakage continued!? How is it even technically possible that the new firmware still leaks packets of the LAN without rewriting the source IP?

At first I guessed that hardware NAT boost was the culprit. I tried it with hardware NAT disabled on the stock firmware but again, the leakage continued. And in LEDE it is not supported anyway, is it?

DjiPi · February 20, 2017, 8:42pm

Could it be a bad VLAN mapping ?

andreas · February 21, 2017, 12:41pm

I haven't changed or added VLAN mappings. The VLAN mappings are as described in the Wiki.
NAT leakage occurs even with factory defaults :-/

DjiPi · February 21, 2017, 6:00pm

Maybe if the stock firmware has bad mapping or port mirroring, those settings could have been transposed from the original stock firmware configuration to the OpenWrt/LEDE configuration ?

Also TP-Link in the past have changed their hardware without changing the revision number - this happened with the Archer C7 v2, from a specific serial number (215C and up).

What I would try to do is to isolate this behaviour on a specific port (LANxx or WiFi 2/5Ghz).

andreas · February 22, 2017, 3:01pm

ACK. I am curious whether this is possible and to what extend. However, I doubt that the NAT leakage in this particular case is software related.

Good point. I haven't tried WiFi due to time constraints and since I don't need it, but I did try all LAN ports as you suggested. Data from all of them leak with missing NAT.

So, since the NAT leakage occurs with stock firmware and factory defaults I contacted TP-Link support Germany which replied within hours, to the point and comprehensively -- incredible! Made me jump for joy.
They agree that this is reason to exchange the device. I hope to get a new one within a week. And I am looking forward to testing it. Will keep you posted.

DjiPi · February 22, 2017, 3:46pm

Great news! Hope this new one will not behave the same.

I'm curious thought; when you said that you connected your device to a sniffer, you meant that you were double-nat ?

DjiPi · February 22, 2017, 7:40pm

It's not supported AFAIK in 17.01.0-rc1, 17.01.0-rc2 nor the final release 17.01.0. Although there is a custom build with that in mind.

andreas · February 23, 2017, 3:12pm

Folks! It's time to test your routers :-/ I received a new TL-WR1043ND v4 today and this one, too, shows the NAT leakage

No, personally I use a bridge without IP between the router's WAN port and my uplink network. I have been running tcpdump and Wireshark on 2 completely different types of hardware. Moreover, my uplink provider was the one who first notified me. In the meantime I also verified that the leakage occurs for WiFi clients, too. Whatever is wrong with these routers its very wrong.

r43k3n · February 23, 2017, 3:51pm

This is very discouraging. I was actually planning to buy this device next week.

Can you maybe provide a more detailed instruction on how to reproduce the bug for all of us not technically advanced? Since the bug doesn't seems to be software relate but rather hardware is a bummer.

Did you contacted TP-Link (again) about this issue? What did they had to say about this?

Edit 1: It was pointed out to me that something similar was happening with v1 until TP-Link patched u-boot making WAN port disabled at boot. Can it be related?

https://dev.openwrt.org/ticket/6819

Edit 2: Also how are we sure that this bug only affects v4? Users of previous hardware versions should also tests their units for the possibility of the bug.

DjiPi · February 23, 2017, 5:21pm

For sure I'm going to test that on my spare time. @andreas: I found something (that you might have already seen) about NAT leakage on Ubuntu, and there are entries to be added into IPTABLES. Maybe you can try this to see if it fixes it?

And to be honest I'm dubious about the soft/hard(ware) related thing . I'm leaning towards the side of software related problem. TP-Link are not foolproof when they write their code (U-Boot mentioned by @r43k3n is a good example).

r43k3n · February 23, 2017, 5:29pm

@andreas - I've mentioned this thread to some people outside of this forum and they where very sceptical about it. They where asking about some proof of existence of the bug. Do you think you could provide it? I think their request is understandable.

andreas · February 23, 2017, 8:25pm

You can count me in! As I wrote in my 1st post I did not expect anything like this. I would have bet quite something that if there is NAT leakage it is caused by "software" and not "hardware" and that flashing with LEDE would fix it.

@r43k3n I am not sure what you expect as a proof. But I'll gladly describe my test setup. And, of course, I can provide traffic dumps.

Test setup: A Windows 7 client wired to a LAN port of TL-WR1043ND. The router has stock firmware (or LEDE, see previous posts) and factory default settings apart from the quick setup wizard (time zone, DHCP for WAN). At the uplink I have a local network 10.0.0.0/24. Between the router and the uplink network I placed an Ubuntu 16.04.1 PC with 2 ethernet ports setup as bridge br0 without IP addresses (I can provide instructions for how to do this, if requested) where I run tcpdump.

The biggest problem so far is triggering the leakage. My very rough estimate is that on average far less than 1% of the packets are emitted without properly rewritten source IP. Just waiting for it usually worked over night when the router was serving my local net. But that's not practical for testing.

I have just tried to reliably trigger the NAT leakage but failed. Perhaps someone else can help here!? What works very well is the following: I start Internet Explorer 11 on the Windows client and go to https:// www . msn . com (that's definitely the most painful part). After a few Ctrl-F5 page reloads packets show up in tcpdump.

I uploaded a small example output of tcpdump -i br0 -nn -l net 192.168.0.0/24 of such a session with IE11: https://fam.tuwien.ac.at/~schamane/tmp/tl-wr1043ndv4-tcpdump-170223-1.txt

@DjiPi Thanks for the pointer to "NAT leakage on Ubuntu". Indeed, I have seen this before and it was helpful in setting things up. But the bug itself has been long fixed. Also, LEDE by default drops invalid packets. I have also tried filtering (-j DROP) the packets by means of custom iptables on the router. But nothing worked. However, since even flashing didn't change anything this is to be expected.

Still, if anyone could proof me wrong I'd be very happy.

andreas · February 24, 2017, 9:15am

The German TP-Link Support just replied to my report of yesterday where I also pointed them at our thread here. They said they'll forward my report to their "developers".

jow · February 24, 2017, 9:28am

NAT leakage typically occurs when WAN and LAN are solely isolated by port based switch vlans and when this vlans are not properly set up by the boot loader when bringing up the ethernet.

LEDE only has a chance to enable this isolation when it is fully booted, which is typically far too late and undesired traffic (like DHCP requests from LAN clients) already made it through the unconfigured switch to upstream.

This particular problem has been present on various devices in the past and LEDE worked around it by using a 2nd stage boot loader which programmed the appropriate switch registers before booting the actual system, see e.g. https://git.lede-project.org/?p=source.git;a=commitdiff;h=b8730517068d6ab2850656220e5bf024ab25da83

The proper place to fix this bug is in the bootloader. Neither fixes to the OEM firmware nor any fixes to LEDE will fully solve this problem.

andreas · February 24, 2017, 11:22am

@jow If I am not mistaken you are referring to boot up time whereas I am observing general TCP traffic from clients long after booting -- see my tcpdump for an example.

oleg-umnik · February 25, 2017, 5:55am

The problem and proposed fix (kind of) is described here:

http://www.smythies.com/~doug/network/iptables_notes/index.html

It has absolutely nothing with LEDE, hardware NAT or VLAN configuration.

DjiPi · February 25, 2017, 7:17pm

I think that @andreas already checked that information and it was not helping (see preceding posts)

MrM40 · March 13, 2017, 9:42pm

How serious is this problem....should I be worried if I don't do anything, e.g. not adding extra rules to the FORWARD chain?

And this is not only related to 1043ND right?.....but related to all routers (it's a "problem" with linux)?

These packages will hit the gateway of my ISP and probably not go any further right?

Can an intruder use these packages to do any harm from outside, e,g. will the NAT be open in some way?

What else should I be worried about in regards to this leak?

andreas · March 13, 2017, 10:22pm

Hi folks,
It seems I was able to find a cause for the NAT leakage. I am currently testing a new setup with a workaround, and will hopefully be able to provide details tomorrow.

@MrM40 So far, I have only tested 2 TL-WR1043ND v4. My new findings, though, suggest indeed that there's a bug in LEDE that potentially affects other devices.

That depends on the network setup. Theoretically, the packets can end up anywhere. Generally, the problem with NAT leakage AFAIK (I am no security expert) is disclosure of your private network infrastructure. Personally, I consider this a critical vulnerability. And I am surprised that TP-Link did not react accordingly to my reports.

The packets that I observed are generally packets that were supposed to be sent to the WAN anyway. What is disclosed is only the source IP of your private network. The firewall is not affected. However, since the packets have an incorrect source IP any reply to the packets will not reach your router and will be lost. In the worst case, a reply might end up elsewhere. Then data that was meant for you could be disclosed to third parties.

Moreover, especially since we do not know what causes the leakage and what triggers it, there is a potential performance issue when a considerable amount of packets gets lost.

andreas · March 14, 2017, 12:17pm

My overnight monitoring showed no NAT leakage with the new setup, so here we go:

TL;DR: LEDE 17.01 is not dropping invalid packets. Add a custom rule like iptables -I forwarding_rule -m state --state INVALID -j DROP (no warranties included)

LuCI has an option "Drop invalid packets", however, I couldn't find a respective line in iptables -L, see my bug report Drop invalid packets doesn't do anything #1068. I knew that this can cause NAT leakage. So, I added iptables -I forwarding_rule -m state --state INVALID -j DROP to the custom rules @ /cgi-bin/luci/admin/network/firewall/custom (i.e. /etc/firewall.user). With this line added I monitored the traffic of the TL-WR1043ND in 2 different settings for about 40h without a single leaked packet.

Mind you that I have always only looked at IPv4 traffic. I cannot speak for IPv6. In the bug report jow- wrote that there is a possible interference with "IPv6 multicast traffic".

To not drop invalid packets was a change in LEDE committed in August 2016 (see the bug report for details). This explains why it is missing in LEDE. It does not explain the NAT leakage that occurs with TP-Link's stock firmware. But I am guessing the reasons are similar. Unfortunately, I never heard from TP-Link support again.

What I am still very curious about is how the invalid packets can be triggered in the first place and why they are invalid. I tried a lot but failed. Yet, just running arbitrary clients always led to NAT leakage, generally within several hours.