Qualcomm Fast Path For LEDE

notgood · August 10, 2017, 4:32pm

Here are some test results on the grandpa of them all... Linksys WRT54G v2.0, alive and kicking since year 2003, lucky one with 32MB RAM from factory

NAT speed, iperf with gigabit wired workstations on both sides:

OpenWRT 10.03.1 pushes ~27 Mbit/s, thats the last usable OpenWRT on these beasts, things went to drain after that
LEDE stable 17.01.2 branch, ~18 Mbit/s, I guess we can "thank" kernel developers for removing kernel cache, and adding general bloat every year
LEDE stable + @dissent1 PR applied makes it to ~32 Mbit/s, using fast_classifier/shortcut_fe combo. Looks promising, may give a second life to this box!

Btw, tested shortcut-fe-cm, no effect at all.
Also, is it necessary to hardcode IPV6 dependency for building these modules? Wastes precious 4MB flash

dissent1 · August 10, 2017, 6:24pm

That's strange, does anyone else has the same behavior?

moeller0 · August 10, 2017, 6:37pm

I seem to recall that was a security robustness decision, see e.g.: https://home.regit.org/2013/03/david-miller-routing-cache-is-dead-now-what/. I am happy to thank not "thank" the linux network developers, stable, robust and correct, beats fast at least for my use-cases

Best Regards

notgood · August 10, 2017, 7:07pm

Well, quoting David Miller:
...the performance of the routing cache is a product of the traffic patterns seen by a system rather than being a product of the contents of the routing tables...
...Google sees hit rates on the order of only 10 percent...
...On simpler systems, cache is effective....

And guess what, in 99% of cases OpenWRT/LEDE is being run on a small, resource constrained devices, with a very SIMPLE end-user traffic pattern.
So obviosly, while particular change simplified kernel code, increased security and resistance to DoS attacks, and generally benefited enterpise usage scenarios, at the same time it catastrophically reduced routing performance where it matters a lot for majority of users: simple and predictable NAT'ed traffic flows on underpowered routers.

moeller0 · August 10, 2017, 7:34pm

And I am certain the typical home user, like my aunt, will prefer being more easily DOSd then sacrificing 100-100*18/27 = 33.33 % of usable bandwidth... (To resolve the tension, my aunt does not care about either, so I believe outside of the enthusiast space people will simply shrug and potentially buy a more modern router (assuming that does not come already from the ISP))

Ah, come on leave the hyperbole for the campaign trail...

Care to share where your numbers were taken from (assuming it is SFW)?

How about getting an adequately powered router instead?

But... I think I understand your complaint, just thought your phrasing was a bit on the sarcastic side, that all, no harm intended...

notgood · August 10, 2017, 8:51pm

Well, perhaps my wording was too strong, still 1/3 performance drop was a rather major regression. Time goes on, software gets more features, noone expects developers to hand-write assembly code and optimize every single line of code for speed and memory, but come on, can't cut that harsh.

I assume household edge devices constitute a clear majority of routers on Internet, seems obvious to me. Improving software on these devices without throwing them out like your average smartphone every 2 years would benefit everyone. And as we see there IS a room for improvement: latency (SQM/bufferbloat efforts), throughput (fastpath, enabling hardware accelerators) and security (frequent firmware updates with a mainline kernel instead of vendors prehistoric out-of-tree blobs).

Anyway, no offence indended and none taken, I definitely respect kernel developers for the work they do.

gwlim · August 10, 2017, 11:47pm

The people working on the Linux Kernel are paid by _.
That is how Linux is going to go.

ypjalt · August 11, 2017, 5:02am

The MWAN3 doesn't work with sfe

ffries · August 11, 2017, 6:48am

A few years ago, I discussed with an engineer who boasted that their team redeveloped part of the kernel to handle traffic using multi-processing and it was way faster. He was very vague, but I guess he was speaking about fastpath or similar projects. So IMHO any large company has its own forked project. Hust my 2 cents: fastpath is very promissing, but in the end hardware acceleration always wins.

hnyman · August 11, 2017, 2:29pm

I made a test build of master with Qualcomm Fastpath using @dissent1 patch.
On that build, nlbwmon (new netlink based per-host traffic stats app from @jow ) did not report ipv6 stats but seemed to report ipv4 normally.

New build from the same commit without fastpath, and nlbwmon again reports ipv6 stats.

Looks like fastpath may cause peculiar problems for netlink-related stuff.

(I had earlier noticed similar missing ipv6 data in nlbwmon with 17.01, so this might not be exactly about fastpath, but in general about netlink stats in some conditions.)

philjohn · August 11, 2017, 9:28pm

Yeah, I noticed the same thing - tbf, I'm not that bothered about the nlbwmon so removed it from my build based on your patches (I'm on a totally unmetered connection).

node · August 20, 2017, 3:52pm

gwlim

Anyway of developing Fast-Path for mipsel_24kc?

guenti_r · August 22, 2017, 1:58pm

I successfully compiled from latest trunk (MT7621), also added the "all-in-one"patch from
https://github.com/dissent1/r7800/commit/4ed549dc60f984891fce43d163b77462f07dc025.patch
to pending-4.4 & pending-4.9.
But now i need a HOWTO to enable Fast-Path.

Any help?

philjohn · August 22, 2017, 2:21pm

Fast Path is platform agnostic, it simply offloads processing of traffic with no complex rules out of the kernel networking stack and into a far more optimised path.

Simply apply the path in the pull request dissent1 has open against the main lede source on github, compile for your arch (ensuring you select the fastpath module in make menuconfig) and flash ... it's as simple as that.

philjohn · August 22, 2017, 2:22pm

It's enabled by default if you selected the correct packages in make menuconfig

can you copy and paste the output of

cat /sys/fast_classifier/debug_info

as that will show us if it's processing anything or not.

moeller0 · August 22, 2017, 2:42pm

So, I have not actually looked at the code, but...
I would expect that not using the kernel stack will force the user to give up a few of the bells and whistles the kernel stack offers. That might still be a decent trade-off, but saying Fast Path offers:

seems to imply the kernel stack would not also be optimzed (to some degree).
Anybody knows the chance of getting that module up-streamed into the kernel proper?

Best Regards

guenti_r · August 22, 2017, 2:43pm

Thanks for your help.

Clarification:
Copy the patch to target/linux/generic/pending-4.9
make menuconfig (there´s no such packages!)

cat /sys/fast_classifier/debug_info shows:
cat: can't open '/sys/fast_classifier/debug_info': No such file or directory

philjohn · August 22, 2017, 3:03pm

No, you need to download the patch from github (posted a few replies above) and then apply that to your checkout of the lede source, e.g.

wget https://patch-diff.githubusercontent.com/raw/lede-project/source/pull/1269.patch
git apply --ignore-space-change --ignore-whitespace 1269.patch

then in menuconfig

Kernel Modules > Network Support > kmod-fast-classifier and kmod-shortcut-fe (not kmod-shortcut-fe-cm)

philjohn · August 22, 2017, 3:05pm

Anything too complex for simple offloading is still handled by the kernel. It's quite clever - it hooks into the network stack so it gets notified of any routing table changes, and anything it can handle it does, anything it can't it lets continue through the default stack.

And that's correct, the Linux Kernel Network Stack is a very generic networking stack that's meant to be used by a wide variety of different devices, if it was massively optimised we wouldn't need fast-path or hardware network acceleration.

loyukfai · August 22, 2017, 3:30pm

Maybe @dissent1 can try the netdev mailing list? (https://www.kernel.org/doc/Documentation/networking/netdev-FAQ.txt)

Cheers.