10GigE and higher routing/shaping discussion

On this thread about SQM speeds we started discussing the related topic of high speed connections, such as 10GigE + connections: WRT32x SQM speed capabilites?

When it comes to 10GE and up, you really start to not have much in the way of clock cycles per packet. For a 10Gig connection, 1500 byte packets, 3GHz clock cycle CPU you have

1500*8/1e10 = 1.2us / packet

and

1.2us/packet * 3e9 cycles/s = 3600 cycles / packet.

You can imagine that just copying 1500 bytes takes a pretty decent fraction of 3600 cycles (even at say 8 bytes per cycle its ~ 200 cycles), then processing iptables on this kind of packet flow is probably a serious issue. Conntrack and soforth become liabilities I think.

Software flow offload is probably going to be required even for x86 based routers, and SQM seems unlikely to work well at 10Gig for prices less than what an enterprise can pay (like $15,000). If it's single threaded, it's even maybe not possible.

Fortunately at these speeds, short of a DoS, it's probably pretty easy to get low latency via simplistic QoS, like weighted round robin or deficit round robin provided you tag your low latency traffic properly. Hardware based switches can handle some of these issues in hardware using such simplistic algorithms (some calcs here: WRT32x SQM speed capabilites?)

Anyway, I set up this thread so people could discuss the topic of truly high bandwidth links. How would you set up today to handle 10GE to your home like those wacky Swiss and Swedes have :wink:

As a point to consider, packets per second becomes the interesting metric for high performance routing, not bits. It’s a lot easier to push 1 Gbps of 1500-byte packets than it is when they’re smaller (think about “real-time” applications and associated control/ack packets, as one example).

See, for example https://bsdrp.net/documentation/technical_docs/performance

1 Like

Yes, I'm guessing that even 800000+ pps at 1500 bytes is already pretty hard if you try to saturate at 200bytes per packet... But then how many people run 100,000 employee call centers out of their garage :slight_smile:

this thought was and will always be "applicable" and got us nowhere.

as said, the 1500b-jitter is going to decline further but that drives up pps so likely jumbo frames or other agregates will be required.

completely agree on the emphasis of pps vs. mbit/s.
there are a lot of (a few years old) talks from redhat and netflix folks regarding scaling in this area..

There is also talk about doing something akin fq_codel in the NICs themselves, and there is the L4S project (https://riteproject.eu/dctth/, see https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-03) trying to move the queues from out of the network back into the endpoints, thereby severely reducing unwanted under-managed buffering (IMHO the goal is laudable, I am just not sure whether the abuse of ECT(1) is the right approach for that, but I digress), There is also the somewhat competing (for the use of the ECT(1) ecn codepoint) SCE proposal (https://tools.ietf.org/html/draft-morton-taht-sce-00) that also aims at helping to make congestion in the network rarer. In short, IMHO the challenges at routing at high speeds are actively being tackled, hopefully resulting in good-enough solutions before >>1GBps links become congested :wink:

I also think that for the foreseeable future (~10 years or so) 1 Gbps should be more than most home networks realistically require, so I believe the don't do AQM but also do not congest the link approach might actually go a long way, giving traffic shaping tech time to catch-up....

BTW, relevant to this discussion: http://netoptimizer.blogspot.com/2014/10/unlocked-10gbps-tx-wirespeed-smallest.html showing that for a sufficient specific defintion of "can" Linux already could manage sending (though the qdisc layer!) at the ~14M packet per second rate required for maximum throughput at minimum packet size @10Gbps in 2014... assuming one had 11 xeon CPUs to burn on this...

Let's not forget jumbo frames, specially at those speeds.

These are WAN connections, can we push jumbo frames across the internet?

1 Like

generally: no
under circumstances: maybe
but core networks use it heavily afaik, as it allows one to tunnel "normal size" customer traffic without fragmentation due to the tunnel-header.

but using jumbo frames on fast connections for local cdn-caches seems sensible to me...

1 Like

It turns out that GRO/GSO already give a decent part of the advantage of jumbo packets with better backward compatibility. The idea is to treat a bunch on consecutive packets as one unit while doing routing look-ups already significantly reduces the kernel load.... Not sure how much that is going to help with >= 10 Gbps networks (especially with frequency scaling is pretty much dead..., it is not very likely that CPU and memory frequencies increase enough to keep the kurrent ~40000 cycles per packet constant).