Limitations on connections in ISP equipment (was LEDE packet per second performance versus other firmware)

Run a BIND server for both local queries and a few domains (only one of nameserver cluster though) so more great advice to consider as well. However do not recall port 53 queries being a large part of session table. Honestly would pay $500 a month or whatever for a real connection but what have are only type of options that have ever been aware of being available. One exception. Someone did try to sell a T1 line with absolutely horrible speeds a few years back.

Prefer OVH as they already host mailservers that use since could not self host these without a reverse DNS record with a static IP which was not even option prior to fiber from ATT. Assume this bridge would be done with a VPN (incorrect skimmed post at first and now see that something called GRE tunnels are an option). This seems like a nuclear option but it is another option.

RE VPN: nah, VPN is encrypted, you're just trying to route around connection tracking tables, you can set up an unencrypted GRE tunnel, it will just send your traffic encapsulated in one connection between your router and your VPS. It's literally adding a few lines of configuration to set up the tunnel interfaces, and then a few lines of policy routing: everything from the bridge node goes over the tunnel. It's not trivial, but it doesn't require a lot of heavy lifting or configuration like setting up certificates or the like. If you have quite a few people using a single mail server or the like, you could route them all down the GRE tunnel as well, particularly if you run the GRE tunnel off the same VPS. Then instead of 100 or 200 connections to the same IMAP server you've just got one tunnel.

RE port 53, I've got about 50 connections in my NAT table for port 53, out of a total of 230 right now. I actually just bumped my dnsmasq cache size to 1500 to help avoid this problem now that I know about it :wink:

The 50 connections seems to be because dnsmasq uses random source ports when querying upstream for security reasons.

I've also got a lot of connections in my NAT table that are left over from mobile phones that probably dropped connections when the wifi was not great. The connection just needs to timeout and it's currently at 27000 seconds or whatever, seems like maybe 8 hours is the timeout duration as nothing in the table goes above about 28800 seconds. If you have a lot of intermittently connected devices on your LAN, this could pose an issue.

Turns out a lot of people have had this issue and some have come up with ingenious packet magic to get around it.

Here's a blog post:

http://blog.0xpebbles.org/Bypassing-At-t-U-verse-hardware-NAT-table-limits

at the bottom in some comments is a Python based solution that just proxies the authentication packets only:

I'd call this high level network magic, so lots of people won't really want to go this route, but thought I'd at least post here for posterity.

Thanks. Saw that over the weekend (but not the GitHub script so greatly appreciate that). Here is another link that references above: https://community.ubnt.com/t5/EdgeMAX-Stories/Bypassing-AT-amp-T-Fiber-Gateway-with-Edgerouter-Lite-newbie/cns-p/1862846. Seriously considered doing this. Issue is that some of those commenting say that connection is not stable for more than a few days. Will post if give this a try if a static IP doesn't work.

It is excellent idea though. Basically just some simple MAC spoofing after using the ATT router to authenticate the connection. Suspect it will work despite some having issues keeping a connection up. ATT IP blocks are only $10 though.

Another option that am looking at is setting up the "cascaded router option." Some posts (https://forums.att.com/t5/AT-T-Internet-Equipment/Strict-NAT-Bridge-Mode-What-is-IP-Passthrough-Can-I-enable-on-my/td-p/5296974) seem to indicate that the cascaded router option can bypass the NAT table without a static IP block.

After setting my dnsmasq instance to use a restricted set of query ports, and of course iptables to force all LAN nodes to use my dnsmasq for DNS resolution, the number of port 53 entries dropped to around 10 from some much higher number (maybe hundreds). Normally Android phones insist on using 8.8.8.8 and 8.8.4.4 as their DNS, and they ignore what's handed out by DHCP, so if you have N Android phones on your LAN, and each one loads a web page with K links, you'll have N*K DNS connections to google DNS. For N=10 and K=100 that's 1000 connections right there. These connections are at least short lived in the NAT table, but if you can prevent this problem it could help a lot.

I can say that on a relatively quiet network, after a reboot of my modem, there are around 100 connections just for a home network with about 10 devices even with the port 53 forwarding to my caching server.

If I were running a small to medium office and everyone were using the same email server, and/or the same server for additional infrastructure (cloud drives, or whatever) I'd be very tempted to tunnel all that common infrastructure that's not latency sensitive over a GRE tunnel direct to the machine where it's sitting. This would ensure you could open unlimited numbers of email connections and the like without any limitation from the router. I'd also run a squid proxy for web connections on your LAN. Squid will reuse single connections to request additional resources from the same server I believe.

If I were going to offer something like a guest network such as the one you are trying to run, I'd probably also tunnel all of that guest network to a VPS as well just so that unlimited numbers of guests would occupy just one connection on the router.

Doing those two things should vastly expand the usability of your network given this limitation. (also getting an NVG 599 with 4096 NAT entries might help).

The eap proxy stuff sounds ... well a little more delicate to tell the truth.

1 Like

The NAT Table limit on the NVG595 magically changed to 2560 (from 2048). Reducing session limits by fine tuning a variety of things also has reduced normal session count but is obviously not a real solution. Will post again when try with static IP block if this bypasses the ATT router NAT table.

The thing that I find strange is that even though i'm using ipv6 heavily, all the ipv6 connections are also listed in this table, and I don't know if it will max out when I use more ipv6 connections. I'm trying to switch to ipv6 for everything, because honestly it's just WAY better technology than NAT. I manage that by setting up NAT64 for everything but web, and using a squid proxy for web connections. If I were in charge at ATT I'd have rolled out fiber as ipv6 only with NAT64 hardware at every neighborhood switching node. This would be similar to the wireless network that T-Mobile has already rolled out. Just one set of issues to deal with, and ATT already has ipv6 pretty widely deployed. Oh well.

Also, I'd have placed a raw ethernet bridge instead of these ARRIS devices to do the 802.1x authentication, and then sold people routers if they didn't already have them and wanted them. Way less hassle to say "we're only responsible for giving you an ethernet connection with an ipv6 subnet, you're free to get whatever router you like and there are hundreds to choose from, but if you want us to set one up, just pay $X one time and we'll sell you our standard one..."

Of course if they did this probably several tens of millions of businesses would drop their $1000/mo business connections and sign up for $100/mo fiber connections. There is no way ATT is going to compete with itself so I assume they are purposefully hampering their service.

Aware that this may not be the correct place for this discussion but would you mind elaborating on how IPv6 works. With NAT one uses their own DHCP server to hand out LAN addresses that all share a WAN address. The way that understand IPv6 is an internet service provider DHCP server issues IPv6 addresses thereby negating the need (and possibly not even allowing) for one's own IPv6 DHCP server. Alternatively was thinking an ISP DHCP server issues the first half of the IPv6 address and a local DHCP server issues the remainder of the address but not clear on how IPv6 works in general.

It's worth doing some reading on, i'll give you the gist:

ipv6 addresses are 64 bits of network address followed by 64 bits of host address. The network address is called the "prefix".

First off in ipv6, routers advertise themselves on the network and advertise what prefixes they route for. So as soon as you plug in a device to an ipv6 network it asks "what routers are out there" and the routers say "here I am, I route for prefix XXX and you should use address YYY for DNS". Then periodically the routers just send out beacons "I'm still here, routing for prefix XXX, please use YYY for DNS"

Typically the next step is called "SLAAC" for stateless address autoconfiguration which means that a machine that hears a prefix advertised generates a host portion using their MAC address and then tacks it onto the prefix... and voila they're on the network. So there's no DHCP involved. There's also "privacy" addresses where essentially you just use a random number generator to generate a host address. Finally there is a DHCP system but it's questionable whether it's a good idea. Typically machines have MULTIPLE ipv6 addresses and this is totally normal in ipv6.

Of course there is an ipv6 DHCP method for handing out prefixes as well, and this is usually used to get a block of prefixes (called prefix delegation). So typically your router asks "hello I want a prefix block of length /56" and then the ISP says "here's the first 56 bits that we own, do what you want with the remaining 8 bits" and so your router can then assign 1 network prefix to each of its internal LAN subnets, up to 256 of them, or you might just ask for a few /64 prefixes, one for each subnet you have. For larger organizations they can get /48 delegated, and for small ISPs they'll typically get a /32 prefix. Big guys like Google or ATT typically have something like /28 or even a /24 or some such thing

Huge advantages include that it's trivial to get a machine on an ipv6 network, just plug it in. Also there is no NAT at all, so you get a public prefix, and then your machines on your network get public ipv6 addresses. This means that your router/firewall needs to have a policy in place to prevent inbound traffic, but it also means that if you want to let people directly connect to your machines, it's possible whereas with NAT forget it. That solves huge headaches for all sorts of things. This is quite honestly the big deal about ipv6: that everyone gets as many public addresses as they need. This enables network connections between any two devices without "second class citizens" behind NAT who need to jump through hoops using special techniques to accept inbound connections.

EDIT: in particular, the vagaries of NAT pretty much destroy the reliability of IP telephones, whereas with ipv6 and a simple firewall rule, you can allow inbound telephone connections to any telephone on your LAN thereby ensuring that if a SIP connection is interrupted and a SIP proxy tries to re-open the connection, it will be able to rather than dropping the call. My calls used to drop all the time, and now that I run ipv6 with a softphone that supports it... my calls stay solid all the time.

Netgear R7800 can drive a gigabit from lan to wan without breaking a sweat - you don't need anything much more powerful.

In my opinion an x86 machine is the best option for today and much more in a near future. High level routers (R7800, Archer C2600, LinkSys WRT3200...) can handle fiber connections with no problems, but in 2/3 years many people will have 500MB/1GB internet speed at home/office. I'm completly sure a x86 machine will handle it, but a router like R7800/C2600.. I'm not sure.

I'm always talking with SQM switched ON, switched OFF maybe a router can handle it.

At home I have an x86 Router (MiniPC Celeron J1900) + Archer C7 v2 as AP and my 300MB fiber is working like a charm.

@dlakelan very interesting that IPv6 uses the ATT device NAT table given what you explained about IPv6. Maybe 6rd prefex delegation requires use of NAT table (ATT device lists IPv6 as service type "6rd" under "Broadband status.") but really using the NAT table doesn't make sense based on what you stated about IPv6.

Secondly switched to an x86 router as is apparently called and is great. Using a 2005 era dual core 2.8Ghz processor, filled with old DDR2 RAM, a new power supply, and two new Intel network cards (one PCI-E and one PCI). So much more control and reporting than Tomato or probably LEDE. That being said if Tomato (which is absolutely not possible because only works on what are called "MIPS" processors and a few ARM) and LEXE as software could be excellent options on an x86 device if they were easier to install in this manner. No more issues with high processor usage with more than one VPN client (like with Tomato) and suspect could handle a huge NAT table or Gigabit connection from WAN if either ever became possible. This being said it isn't the Tomato (or LXDE) software itself that cannot handle any given thing on a router, it is the hardware upon which a Tomato router sits which limits the computer processor power which now understand and am very clear about.

Highly recommend a hardware router to anyone thinking about it so thank you for the suggestion. Activated static IP block and will eventually try to bypass the NAT table limit of internet service provider required device and post results.

On my NVG599 it used to say 6rd type connection but now it says "native". The 6rd doesn't require NAT, in the sense that it's actually tunneling the ipv6 packets over a single tunnel to a 6rd server somewhere. I personally think that the "firewall" features of the NVG device are what requires the connection table, and this is also true of your router, it needs to know that you made an outbound connection so that it can pass the return traffic. But, there's no excuse for putting such small connection tables. And there's absolutely no need to connection track when you've got the firewall turned off. Instead of 2048 or 4096 it should have something like 65536. Sigh.

Also, by the way, which distribution did you settle on for your PC router? IPFire?

LEDE is available for x86 but LEDE is optimized for small systems. It's not a terrible choice, but it's not totally obvious that it's the best choice either, compared to the embedded hardware consumer routers, where it is the best choice.

IPFire. The BSD variants actually looked better but didn't really understand most options (and there were a great deal of them). While do agree that LEDE is likely better than Tomato what kept from ever moving from Tomato to LEDE (wanted to due to Dnsmasq exploit, old 2.6 kernel probably with own exploits that will never be updated, and other things) is that Tomato works great with WiFi due to someone explaining that they somehow use proprietary Broadcom drivers while LXDE open source Broadcom drivers do not perform well (based on understanding of explanation). This means that using LEDE makes a great router alone but may make the WiFi on a device requiring Broadcom drivers useless. It is a whole new world when one views a router as what apparently it should be, a router alone.

After the above experience would rather put LEDE/ IPFire/ whatever on a hardware router or at least dedicated router and use any given device with strong WiFi capability and proprietary drivers (which unfortunately seem to go hand in hand) from an Ubiquiti (which actually do not recommend since one has to administer them through a Windows program) to Cisco Meraki's (at least can login to these through any browser) to Asus RT-N66U's with Tomato as access points.

When looked into installing LEDE on hardware router looked at this page: https://lede-project.org/docs/instructionset/x86_64

@dlakelan Previously recommend as QOS system earlier in this thread. Bandwidth has never been an issue (am probably only one who will ever use or knows what an FTP/SFTP upload is at own firm and this won't max out 50 megabits per second upstream). Obviously the NAT table limit is. Wish that there was a "QOS" system for connections where could prioritize and limit connections to a certain number using the same general rules that QOS for bandwidth currently uses.

@schnappi is SQM working in IPFire?

Thanks,

@Klingon Do not know what SQM is. However QOS in IPFire is turned on with defaults. Tested limiting an FTP upload a few days ago and that worked. These may be helpful:

https://forum.ipfire.org/viewtopic.php?t=13259

https://planet.ipfire.org/post/ipfire-2-13-tech-preview-fighting-bufferbloat

SQM in LEDE is working great (after a little tunning), examples from my LEDE x86 Router.

SQM switched OFF (as normal routers do):

Download: 293.8 Mbps
Latency: (in msec, 61 pings, 0.00% packet loss)
Min: 19.545
10pct: 26.162
Median: 79.973
Avg: 85.335
90pct: 163.232
Max: 170.889

SQM switched ON:

Download: 271.85 Mbps
Latency: (in msec, 53 pings, 0.00% packet loss)
Min: 14.245
10pct: 14.534
Median: 15.000
Avg: 15.404
90pct: 15.981
Max: 27.835

Tool used for testing betterspeed.sh found here https://www.bufferbloat.net/projects/bloat/wiki/Tests_for_Bufferbloat/

I loose some speed, but the results are impressive, the bufferbloat is "over".