Qualcomm Fast Path For LEDE

@philjohn,

many thanks for your instructions!
Implemented it on Ubiquiti Edgerouter ER-X with LEDE latest trunk and get easily 930MBit NAT-Performance (quick test).

You can either use:

git reset --hard HEAD

or just run make menuconfig again and unselect the two kmod's, then build

You can also simply up-apply "revert" the commit as a patch

patch -R -p 1 -i patchfile

@philjohn

wget https://patch-diff.githubusercontent.com/raw/lede-project/source/pull/1269.patch
git apply --ignore-space-change --ignore-whitespace 1269.patch

then in menuconfig

Kernel Modules > Network Support > kmod-fast-classifier and kmod-shortcut-fe (not kmod-shortcut-fe-cm)

I just did this, and the build was fine for my Archer C7v2. However, I had a problem with my VPN setup (IPsec roadwarrior config using Strongswan): I can VPN into my router just fine, but I could not connect to an RDP session behind my router (rdp-client --> ipsec tunnel over internet --> VPN server on router --> RDP server in local LAN)

The rdp connection would start, but then disconnect after about 10 seconds. I needed to do "rmmod fast-classifier" to get everything stable again.

Before the rmmod, fast path seemed to be working ok:

root@router ~# cat /sys/fast_classifier/exceptions
NO_IIF = 12682
NO_CT = 1
CT_NO_CONFIRM = 727
TCP_NOT_ASSURED = 186
WAIT_FOR_ACCELERATION = 10445
CT_DESTROY_MISS = 1220
root@router ~# head -n1 /sys/fast_classifier/debug_info
size=84 offload=0 offload_no_match=0 offloaded=41 done=39 offl_dbg_msg_fail=41 done_dbg_msg_fail=39

Any idea what could be wrong?

I'm afraid I don't know, working fine here with an OpenVPN server running, but I've not used strongswan before (or even know much about it). Might be better contacting the upstream project on CodeAurora with the issue.

Thanks for the hint, but isn't this working with 'lede-17.01' branch? (I just tried out and there's no such option in Kernel config.)

Worked for me, did you follow the instructions explicitly?

Also, are you configuring the kernel directly (which is possible), or running make menuconfig in the root Lede checkout directory?

One thing - you may need to move the patches from hack-4.4 to patches-4.4 (look in the patch to see the full path)

That was it and also: 4.9 modifications have to be removed completely from it to be able to apply the patch on current lede-17.01 branch. (I haven't compiled it yet, but it should be fine.)
Thanks for your help!

So, do we know exactly how SFE should behave with SQM?
Thanks

is sqm and fastpath not the opposing tradeoffs?
as in:

sqm: use more cpu to better manage scare capacity/bandwith
fastpath: use less cpu to better handle high capacity (because cpu is too slow otherwise)

No, because no matter how much the bandwidth is it will be full at some point. :slight_smile:

I've now enabled SQM, remember that SQM is only on your WAN interface, whereas fastpath will accelerate things locally as well, so not necessarily competing.

Question, if you run SQM on your WAN interface, is the sirq load during a (saturating) speedtest lower with FAst Path enabled or not. If yes, by what magnitude. And final question is SQM@WAN without FastPath already throttling your internet (asked differently, does SQM alone already ax out your CPU)?
I am trying to figure out what to recommend to sqm users (obviously without needing to test fast path myself -EOUTOFTIME)

Best Regards

It's difficult to monitor in reality as I'm running on an R7800 which doesn't really break a sweat handling the 80/20 FTTC product I'm on even when fastpath is disabled.

FastPath comes into its own for local file transfer though, but then, that doesn't have SQM applied.

-ECATCH22

I compiled finally yesterday lede-17.01 branch with his patch on Archer C5 v1, and debug numbers barely increase. If I disable it, SQM numbers start to grow. (SQM@WAN)

@moeller0, unfortunately I have a crappy connection (76/20 Mbps) so I can't really test your cases.

Now I compile gwlim's current version, I'll only enable kmod-fast-classifier and see how it behaves with SQM.

I know it might be too much to ask. But, did anyone successfully build with this patch on 4MB flash?

@clyang, I didn't try yet. Technically shouldn't be a problem. Memory should be enough if you made your own lite (4MB) image. But..which device you have in mind with 4MB and gigabit switch? On standard 100mbps Ethernet, there isn't any real performance increase, except a lower SIRQ, so maybe wifi might benefit from that.

Forgive me since I'm a beginner. I can't really tell if it's working for me either, but I'll post my stats anyway. I'm on an extremely bad connection compared to everyone else here at 10Mbps down and 1Mbps up. I recently was playing around with SQM so the stats for it are a bit younger, but it was running the whole time alongside fast-classifier before tweaking.

  root@Downstairs:~# uptime
  13:06:20 up 17:51,  load average: 0.06, 0.01, 0.00
  root@Downstairs:~# cat /sys/fast_classifier/debug_info
  size=36 offload=0 offload_no_match=0 offloaded=3338 done=3329 offl_dbg_msg_fail=3338 done_dbg_msg_fail=3329 

(Then MAC addresses and local IPs and ports on the left and outside IPs on the right)

  root@Downstairs:~# tc -s qdisc
  qdisc noqueue 0: dev lo root refcnt 2 
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 
  qdisc cake 802e: dev eth0 root refcnt 2 bandwidth 900Kbit diffserv3 dual-srchost nat rtt 200.0ms noatm overhead 18 via-ethernet mpu 64 
  Sent 38928241 bytes 405502 pkt (dropped 410, overlimits 495872 requeues 0) 
  backlog 0b 0p requeues 0 
  memory used: 386848b of 4Mb
  capacity estimate: 900Kbit
             Bulk   Best Effort      Voice
thresh      56248bit     900Kbit     225Kbit
target       323.0ms      20.2ms      80.7ms
interval     646.0ms     210.2ms     161.5ms
pk_delay         0us      15.2ms       6.7ms
av_delay         0us       3.0ms       1.4ms
sp_delay         0us       114us        29us
pkts               0      403270        2642
bytes              0    38964390      372875
way_inds           0        1432           0
way_miss           0        5309         456
way_cols           0           0           0
drops              0         410           0
marks              0           0           0
sp_flows           0           0           0
bk_flows           0           1           0
un_flows           0           0           0
max_len            0        1514         590

qdisc ingress ffff: dev eth0 parent ffff:fff1 ---------------- 
Sent 883176955 bytes 997916 pkt (dropped 0, overlimits 0 requeues 0) 
backlog 0b 0p requeues 0 
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn 
Sent 114947366 bytes 320182 pkt (dropped 0, overlimits 0 requeues 3) 
backlog 0b 0p requeues 3 
maxpacket 549 drop_overlimit 0 new_flow_count 6 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2 
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
backlog 0b 0p requeues 0 
qdisc noqueue 0: dev wlan0 root refcnt 2 
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
backlog 0b 0p requeues 0 
qdisc noqueue 0: dev wlan0.sta1 root refcnt 2 
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
backlog 0b 0p requeues 0 
qdisc cake 802f: dev ifb4eth0 root refcnt 2 bandwidth 9Mbit besteffort dual-dsthost nat wash rtt 200.0ms noatm overhead 18 via-ethernet mpu 64 
Sent 895657214 bytes 996912 pkt (dropped 1004, overlimits 898838 requeues 0) 
backlog 0b 0p requeues 0 
memory used: 371008b of 4Mb
capacity estimate: 9Mbit
             Tin 0
thresh         9Mbit
target        10.0ms
interval     200.0ms
pk_delay       166us
av_delay        10us
sp_delay         2us
pkts          997916
bytes      897147779
way_inds       44213
way_miss        5403
way_cols           0
drops           1004
marks              3
sp_flows           0
bk_flows           1
un_flows           0
max_len         1514

Reason why I applied this was to improve wireless transfer speeds with my NAS.
Fast-Classifier with Shorcut-FE only. Obtained from Dissent1's RFC commit.

EDIT:
Pasting my Bufferbloat results (8 Streams Down, 4 Streams up [as much as my cable connection will alow]; High Res Bufferbloat, 30 Secs Upload and Download, Dodge Compression Enabled). Don't know if this will help, @moeller0. Relatively quiet network allowed me to test further. Fast-Classifier Debug info rose to 3800+ in 3 hrs with around 2 active clients at the moment. @philjohn

(ignore the A+ forgot to change it a while ago prior to upgrading when it was still 5mbps max)

The offloaded count of 3338 in debug_info shows that it's working.

Check that number keeps increasing, but looks like you're good to go.

Interesting.
Which dissent1's version have you applied to which branch? And was SQM enabled on WAN interface?

A.
I tried this pull request with lede-17.01 branch with SQM applied to WAN, but they didn't work together well.

B.a)
Current gwlim's version when only fast-classifier compiled with SQM@WAN:

  • it only "accelerated" 1 out of 2 separated VPN connection (don't ask why :slight_smile: )

B.b)
Current gwlim's version with both fast-classifier and shortcut-fe-cm compiled and SQM@WAN and rmmod shortcut-fe-cm (!!!):

  • both separated VPN connections were accelerated :slight_smile:

So, for now, I added this into rc.local with B.b) gwlim's version:

echo 1 > /sys/fast_classifier/skip_to_bridge_ingress
rmmod shortcut-fe-cm

Thanks for both of you your work!

EDIT: It turned out that both B.a) and B.b) is wrong with gwlim's patch.

1 Like