Qualcomm Fast Path For LEDE

Could anyone that uses sqm try this patch and provide feedback?

BTW it seems that sfe does not support routed packets (case of native ipv6) but only NATed.

no any change (disable minifi luci src from "make menuconfig"; compile firmware with luci-app-nlbwmon):

Mon Aug  7 13:42:19 2017 kern.info kernel: [   48.089858] 
Mon Aug  7 13:42:19 2017 kern.info kernel: [   48.089858] do_page_fault(): sending SIGSEGV to nlbwmon for invalid write access to 00000000
Mon Aug  7 13:42:19 2017 kern.info kernel: [   48.098470] epc = 7765ec64 in libc.so[7763e000+a0000]
Mon Aug  7 13:42:19 2017 kern.info kernel: [   48.103665] ra  = 55b3c1a8 in nlbwmon[55b37000+a000]
Mon Aug  7 13:42:19 2017 kern.info kernel: [   48.108744] 
Mon Aug  7 16:30:31 2017 daemon.err uhttpd[1486]: Error while processing command: Bad file descriptor
Mon Aug  7 16:30:32 2017 daemon.err uhttpd[1486]: Error while processing command: Bad file descriptor
Mon Aug  7 16:30:37 2017 daemon.err uhttpd[1486]: Error while processing command: Bad file descriptor
Mon Aug  7 16:30:37 2017 daemon.err uhttpd[1486]: Error while processing command: Bad file descriptor
Mon Aug  7 16:30:38 2017 daemon.err uhttpd[1486]: Error while processing command: Bad file descriptor

When disable autostart (/etc/init.d/nlbwmon disable && reboot && exit) problem does not exist - only if disable...practice not possible to using it without errors.
Of course, built according to your instructions at https://github.com/gwlim/mips74k-ar71xx-lede-patch minus patches https://github.com/gwlim/mips74k-ar71xx-lede-patch/blob/lede-17.01/patch/099-add-default-package.patch (I just have a WNDR4300 as a relayd relay, and most of these extra patches really messed me up the router and I do not really need it - config and add some manually in "make menuconfig") and now minus the patch https://github.com/gwlim/mips74k-ar71xx-lede-patch/blob/lede-17.01/patch/065-enable-luci-src-diet.patch (as I understood your advice), and plus a few external news apps/scripts that do not interfere in any way with your modifications/patches/scripts/etc. (my .config http://wklej.org/id/3231289/)
Generally problem still exist.

Right now I am compiling a new image. I'll answer you in a few hours.

At the first glance, this patch seems to return the "normal" SQM functionality for me. Great job.

I have used "flent" RRUL test for latency testing:

  • When I tested two days ago without this patch, Fastpath caused the latency under load to move from the normal 20ms to something like 50ms
  • With this patch the latency stays at the expected ~20ms. I tested with both simple/fq_codel and layer_cake/cake. Looks good to me.

Ps. You might have already noticed, but due to the kernel patch reorganisation, your main SFE patch does not compile any more. Patches need to be in pending-4.9 etc.

I've been trying for about 2 hours with a new image with your new patches, and voila! Finally the SQM ingress works fine, the latency stays stable 100 - 200 ms (My Internet Provider is via wireless). Tested with 5 devices connected at the same time.

PD: Move patches to pending-4.9 and pending-4.4, if not, build fail.

Thanks for pointing out, updated the commit in PR

How is the throughput with:

a) both SQM and SFE enabled
b) only SQM enabled

If I understand the patch correctly, having SQM enabled basically asks SFE to not do anything with the packet, but the throughput performance can still be better by freeing up cpu resources used on other interfaces that doesn't have SQM enabled.

If your ISP bandwidth is a limiting factor, then you probably can enable SQM on lan interface to test this out, assuming that your router isn't powerful enough to max out 1gbps with SQM on in any case...


1 Like

Anyone with dir-860l (b1) can run a iperf3 test WAN->LAN? I don't have 2 gigabit devices at the moment.
You can find images here.

Is this image based on master or stable ? I can dot the tests with dir-860l but I need master for vlan support.
By the way, the dir-860l already performed 1Gbpts WAN-> LAN without SFE.

It's from master.

@adrian_dsl : It did a quick test on my "best effort" 1gbps ftth connection, at this time of the day, the max throughput is 550Mbps with your image (like my previous master image wihtout sfe).
I checked sirq % usage during test, it's really better: 30% (vs 49% with my previous image). I'll perform a new test this night, throughput should be close to 1Gbps.

Here are some test results on the grandpa of them all... Linksys WRT54G v2.0, alive and kicking since year 2003, lucky one with 32MB RAM from factory :slight_smile:

NAT speed, iperf with gigabit wired workstations on both sides:

  1. OpenWRT 10.03.1 pushes ~27 Mbit/s, thats the last usable OpenWRT on these beasts, things went to drain after that
  2. LEDE stable 17.01.2 branch, ~18 Mbit/s, I guess we can "thank" kernel developers for removing kernel cache, and adding general bloat every year
  3. LEDE stable + @dissent1 PR applied makes it to ~32 Mbit/s, using fast_classifier/shortcut_fe combo. Looks promising, may give a second life to this box!

Btw, tested shortcut-fe-cm, no effect at all.
Also, is it necessary to hardcode IPV6 dependency for building these modules? Wastes precious 4MB flash :slight_smile:

That's strange, does anyone else has the same behavior?

I seem to recall that was a security robustness decision, see e.g.: https://home.regit.org/2013/03/david-miller-routing-cache-is-dead-now-what/. I am happy to thank not "thank" the linux network developers, stable, robust and correct, beats fast at least for my use-cases :wink:

Best Regards

Well, quoting David Miller:
...the performance of the routing cache is a product of the traffic patterns seen by a system rather than being a product of the contents of the routing tables...
...Google sees hit rates on the order of only 10 percent...
...On simpler systems, cache is effective....

And guess what, in 99% of cases OpenWRT/LEDE is being run on a small, resource constrained devices, with a very SIMPLE end-user traffic pattern.
So obviosly, while particular change simplified kernel code, increased security and resistance to DoS attacks, and generally benefited enterpise usage scenarios, at the same time it catastrophically reduced routing performance where it matters a lot for majority of users: simple and predictable NAT'ed traffic flows on underpowered routers.

And I am certain the typical home user, like my aunt, will prefer being more easily DOSd then sacrificing 100-100*18/27 = 33.33 % of usable bandwidth... (To resolve the tension, my aunt does not care about either, so I believe outside of the enthusiast space people will simply shrug and potentially buy a more modern router (assuming that does not come already from the ISP))

Ah, come on leave the hyperbole for the campaign trail...

Care to share where your numbers were taken from (assuming it is SFW)?

How about getting an adequately powered router instead?

But... I think I understand your complaint, just thought your phrasing was a bit on the sarcastic side, that all, no harm intended...

Well, perhaps my wording was too strong, still 1/3 performance drop was a rather major regression. Time goes on, software gets more features, noone expects developers to hand-write assembly code and optimize every single line of code for speed and memory, but come on, can't cut that harsh.

I assume household edge devices constitute a clear majority of routers on Internet, seems obvious to me. Improving software on these devices without throwing them out like your average smartphone every 2 years would benefit everyone. And as we see there IS a room for improvement: latency (SQM/bufferbloat efforts), throughput (fastpath, enabling hardware accelerators) and security (frequent firmware updates with a mainline kernel instead of vendors prehistoric out-of-tree blobs).

Anyway, no offence indended and none taken, I definitely respect kernel developers for the work they do.

The people working on the Linux Kernel are paid by _.
That is how Linux is going to go.

The MWAN3 doesn't work with sfe

A few years ago, I discussed with an engineer who boasted that their team redeveloped part of the kernel to handle traffic using multi-processing and it was way faster. He was very vague, but I guess he was speaking about fastpath or similar projects. So IMHO any large company has its own forked project. Hust my 2 cents: fastpath is very promissing, but in the end hardware acceleration always wins.