Qualcomm Fast Path For LEDE


#467

The offloaded count of 3338 in debug_info shows that it's working.

Check that number keeps increasing, but looks like you're good to go.


#468

Interesting.
Which dissent1's version have you applied to which branch? And was SQM enabled on WAN interface?

A.
I tried this pull request with lede-17.01 branch with SQM applied to WAN, but they didn't work together well.

B.a)
Current gwlim's version when only fast-classifier compiled with SQM@WAN:

  • it only "accelerated" 1 out of 2 separated VPN connection (don't ask why :slight_smile: )

B.b)
Current gwlim's version with both fast-classifier and shortcut-fe-cm compiled and SQM@WAN and rmmod shortcut-fe-cm (!!!):

  • both separated VPN connections were accelerated :slight_smile:

So, for now, I added this into rc.local with B.b) gwlim's version:

echo 1 > /sys/fast_classifier/skip_to_bridge_ingress
rmmod shortcut-fe-cm

Thanks for both of you your work!

EDIT: It turned out that both B.a) and B.b) is wrong with gwlim's patch.


#469

I'm bemused. I thought fast path only increased NAT transfers, not local ones.


#470

fast path increases anything that doesn't need to go via the usual kernel processing stack, unless you've got a gigabit (or similar) connection, the only benefit is lan to lan


#471

I used dissent1's latest version. eth0 on the TP-LINK TL-WR1043NDV2.1 is WAN.

A) I used used the commit on LEDE Branch 17.01 with SQM. I ended up manually downloading the entire image at the pull request here. I then put the files in manually (making all the directories and stuff, and editing config-4.4). I also moved the patches from "hack-4.4" to "patches-4.4" in the same directory.

B)a & b) I think this is where the difference lies, I don't use a VPN. If it is necessary for your setup, you might as well keep the workaround you're doing right now.


#472

Hi,

does this also work with usb attached wireless/ethernet dongles?


#473

I wanted to build for the Archer C7, so I have a build that is stock + this Fast Path patch, so I did the following.

Cloned master:

git clone https://github.com/lede-project/source.git lede

Changed directory:

cd lede

Downloaded the patch:

wget https://patch-diff.githubusercontent.com/raw/lede-project/source/pull/1269.patch

Applied the patch:

git apply --ignore-space-change --ignore-whitespace 1269.patch

I got the following output:

1269.patch:173: trailing whitespace.
FAST Classifier connection manager for Shortcut forwarding engine. 
1269.patch:1862: trailing whitespace.
		if ((s32)(ct->proto.tcp.seen[1].td_end - sis->dest_td_end) < 0) { 
1269.patch:1897: trailing whitespace.
#if (LINUX_VERSION_CODE >= KERNEL_VERSION(4, 9, 0)) 
1269.patch:3696: trailing whitespace.
#if (LINUX_VERSION_CODE >= KERNEL_VERSION(4, 9, 0)) 
1269.patch:11351: space before tab in indent.
 	__u8			inner_protocol_type:1;
warning: squelched 219 whitespace errors
warning: 224 lines add whitespace errors.

Is this correct? Apologies if this has been posted before.


#474

"patch -p1 < 1269.patch" should work


#475

just move the patch files from hack-4.4 into patch-4.4, select the correct options in make menuconfig and you should be good to go


#476

I think things are located correctly for a tree post commit.


#477

Isn't that master rather than 17.01?


#478

That is was @Gigabit cloned


#479

Short version:

I put back gwlim's original version of this patch (without removing anything, I also edited my 2 recent posts above) and using it with SQM@WAN:

  • it almost 2x faster then dissent's one and rock stable (I used it for a month before)
  • dissent's one disabled pastpath acceleration after around 5 mins on intesive SQM ingress use

Edit: it turned out gwlim's patch doesn't work with SQM ingress.

Longer version:

I streamlined building firmware so everything is the same (including config files as well) apart from the SFE kernel modules between the 2.
Both firmwares are built on lede-17.01@a006b48 (2017-08-29) branch and contains gwlim's amazing patchset.

  • gwlim patches
  • dissent1's patch for lede-17.01 branch (instruction is included)
    -- note: I also went through all the applied kernel patches whether they are fine

Test setup:

  • Archer C5 v1.2 (mips74k, 720MHz, 128RAM, 16MB)
  • net: PPPoE, 76Mbits/20Mbits, ipv4
  • SQM@pppoe-wan, cake/layer-cake, 87500/24500, ATM:44
  • a laptop without VPN connection
    -- run a torrent client with 1 (same) download at max speed
  • another laptop with VPN connection
    -- stream music in the meantime
  • run htop (in tmux) on router to see CPU % (or top for sirq)
    -- that's the best indicator whether SFE and SQM works well indeed

Results:

dissent1's version:

  • until 5 mins: down 8900 KB/s , up 1600 KB/s , number of connections ~70, htop ~56%
  • after 5 mins and another 10 mins (till the end of test): htop ~80% !!!
    -- that means something broke!

gwlims's version:

  • down 9000 KB/s , up 1300 KB/s , number of connections ~70, -> htop ~30%

So, I suggest to look at sirq usage if you use dissent1's version with SQM.


#480

Should I say that @gwlim ‘s version contains not only SFE but also mips optimizations?
Relating to the issue that something breaks after some time, could you clarify if you build the firmware with only sfe and f-c included (without sfe-cm), but not all three together and then you simply rmmod the sfe-cm.
Also please retest with sfe-cm loaded instead of fast-classifier.

And btw, running sqm within the tunnel (pppoe) is a bit different story.


#481

If there is a difference, its most likely caused by

I've also added missing function to update udp statistics by
fast-classifier and some more small fixes.

The udp statistics thing seems to be unlikely since the main traffic load in your case was originating from the torrent TCP traffic. Gotta be some of the extra fixes.
I compared the main implementation files:

https://gist.github.com/MartB/214613e499a9c2364ee761ec4d67cbbb sfe_cm.c
https://gist.github.com/MartB/fb2ec15a253f8460809973f381c0ff00 fast-classifier.c

Does not look like the increase in irqs is due to the patching might be some kernel patch.
I will dig into it.

Edit:
I checked most of the important stuff but it does not seem to be different in most places. (Besides gwlim removing code and dissent using #ifdef)
Are you sure you had his stuff compiled properly ?


#482

As per your instruction, only sfe and f-c were included and not sfe-cm.

Also note that gwlim's version requires all the 3 modules not just fast-classifier, otherwise htop % value goes up high, that means no acceleration.

I can't, I haven't built your version with sfe-cm.

[quote="dissent1, post:480, topic:4582, full:true"]And btw, running sqm within the tunnel (pppoe) is a bit different story.
[/quote]
Well, I don't know anything about this, unfortunately this is what I have :slight_smile:

And as a last note: thanks all of you for your work!


#483

If you are talking to me then: yes, I'm sure they are all proper. :slight_smile:
The main diff between the 2 (but I didn't go through the code changes) is the non-usage/usage of shortcut-fe-cm module.
And note, the test didn't run through VPN but normal connection.


#484

Yeah well you are only supposed to use shortcut-fe-cm or the fast classifier anyway so thats no issue then.
See my edit above for the stuff i found during briefly checking the patches (not much that warrants such a performance decrease)


#485

Not with gwlim's version, you need all the 3:


#486

@chros
You might be onto something.


Also inserts all modules.

Edit: As dissent pointed out in his commit message that indicates that shortcut-fe-cm always comes first if both are selected so maybe theres an issue with the fast-classifier only ?
Can you test it with dissents version but with only the shortcut-fe-cm enabled ?