OpenWrt Forum Archive

Topic: Performance Oriented OpenWRT Builds

The content of this topic has been archived between 12 Sep 2015 and 21 Apr 2018. There are no obvious gaps in this topic, but there may still be some posts missing at the end.

Yeah I got it working too with the patches in the ticket, but like I said crashed almost immediately. Some times worked for a minute before crash. It's old code that they just hacked to some kind of working state, they've removed it from openwrt too.

Hello Alpha

Please could I ask if the build has the  iptables '-m time' option available ? need a build with ability to time restrict internet access.  WDR4300

Thanks

jinott wrote:

Hello Alpha

Please could I ask if the build has the  iptables '-m time' option available ? need a build with ability to time restrict internet access.  WDR4300

Thanks

I tried using the time option seems to be available probably means default openwrt has it.

(Last edited by alphasparc on 22 Jun 2015, 18:11)

Thanks alpha... Your right I think...I've been trying 'Rooter' builds...and thought the  iptables m time option was missing not working ...however i worked out it was a case of me not realising that the rules use UTC not local time. 

Ive tried today to install your June image for WDR4300 v1....went back to the revert stock image first and then tried ...but get error 18005.... I wondered if it was somehow because I've installed rooter images before...?  Currently only rooter images and the revert stock image flash ok.... I'm confused as to why...any ideas appreciated.

I have no idea what changes rooter made to the firmware so you have to consult rooter.
Basically the images are build such that it is sysupgradable between Gargoyle and OpenWrt.
WDR4300 has Israel Version and the Normal International ones check you downloaded the correct versions.

Did you manage to somehow improve on NAT speed when comparing to stock OpenWrt?

With stock firmware devices use hardware NAT, but with OpenWrt you only get software NAT and because of that less NAT performance.

Did you work on improving NAT perfromance?

Can you please explain what are main performance gains when compared to stock OpenWrt and your firmware? I'm eager to learn how you enahanced OpenWrt. Thanks.

I modified the default sysctl.conf, added the QCA patches basically I tested and added all the patches.
Enabling JUMP_LABEL also improved performance more for PPC than MIPS
I can't remember all the change I did they are all stored as patches so I don't have to.
There is also the effect of the modules load addresses, if you compile network functions as part of the kernel instead of purely as kmodules it goes faster (higher throughput) especially for MIPS.
I also added ASM optimization for PPC for OpenSSL very much like the MIPS one OpenWRT has but for PPC
The whole exercise is very interesting and the patches are the results of my testing smile

Try -> Compile -> Test If work accept else reject

All the changes are in the patch folder, feel free to take a look
If you want to test the performance difference you can try OpenSSL benchmark as well as NAT benchmark between my builds and default openwrt barrier breaker.
I am working on the chaos calmer release at the moment but that is something wrong with LuCI/Lua in chaos calmer based on the UI responsiveness I can feel it but can't tell what.

(Last edited by alphasparc on 18 Jul 2015, 15:27)

alphasparc wrote:

jperf testing (No SQM-Scripts Running) -

WR1043NDv1 Overclocked to 430MHZ - NAT ~ 290Mbits/s
WR4300v1 Overclocked to 730MHZ - NAT ~ 530Mbits/s
WR4900v1 Stockclock (800MHZ) - NAT ~ 530Mbits/s

Can you please post also non-overclocked results, to see what is the difference and also most interesting comparison - with current stock BB and CC firmware. If you get 3% increase that could be a statistical error, but if you get 30% or more then you deserve high praises!

(Last edited by valentt on 19 Jul 2015, 11:22)

valentt wrote:
alphasparc wrote:

jperf testing (No SQM-Scripts Running) -

WR1043NDv1 Overclocked to 430MHZ - NAT ~ 290Mbits/s
WR4300v1 Overclocked to 730MHZ - NAT ~ 530Mbits/s
WR4900v1 Stockclock (800MHZ) - NAT ~ 530Mbits/s

Can you please post also non-overclocked results, to see what is the difference and also most interesting comparison - with current stock BB and CC firmware. If you get 3% increase that could be a statistical error, but if you get 30% or more then you deserve high praises!

You can download the firmware and test it.
I believe the only way you can believe something is when you test it yourself.
http://wiki.openwrt.org/inbox/benchmark.nat
Let me know what you get so there is no bias confirmation

(Last edited by alphasparc on 19 Jul 2015, 14:53)

I love testing but it is not trivial. First I have to leave my main router up and running because my significant other works from home most of the time. If you have devices and things pre-setup please share your numbers. I'll also do my own tests but it takes time to setup everything and to have device to test on.

This thread has some interesting NAT benchmarks - https://forum.openwrt.org/viewtopic.php?id=53703

(Last edited by valentt on 20 Jul 2015, 00:01)

valentt wrote:

This thread has some interesting NAT benchmarks - https://forum.openwrt.org/viewtopic.php?id=53703

That thread has nothing to do with NAT, see the 1st post in there. It's a test between a wireless and wired client, both on LAN. The problem looks to be in the wifi driver.

(Last edited by mastabog on 20 Jul 2015, 19:12)

alphasparc wrote:

-OpenSSL Assembly for PowerPC (Tested Performance boosted Significantly)

@everybody Unfortunately I don't have WDR4900 which is PowerPC based so I can't do any benchmarks to compare. Can anybody with WDR4900 do OpenSSL benchmark, instructions are really easy to follow [1] and post your numbers here and also to table on the wiki.

@alphasparc Highest number for AES-128 with WDR4900 currently in OpenSSL benchmark table is 23153400, did you manage to beat that?

[1] http://wiki.openwrt.org/inbox/benchmark.openssl

(Last edited by valentt on 23 Jul 2015, 17:16)

alphasparc wrote:

I added L7 in p2p-block, turned on torrent , all my torrents are blocked, works fine without crashing.

I always thought that L7 or any other blacklist means don't work on torrents if they are encrypted, and most of clients use encryption now.

Just to take note there is no crypto acceleration in WDR4900v1 only Assembler Optimization with my patch.
The crypto caam device is not initialised and the openssl is not build with dev/crypto to enable it.

valentt wrote:
alphasparc wrote:

I added L7 in p2p-block, turned on torrent , all my torrents are blocked, works fine without crashing.

I always thought that L7 or any other blacklist means don't work on torrents if they are encrypted, and most of clients use encryption now.

It works but I dropped it because it doesn't block over ipv6.
So it seems like a kind of a ugly function that half worked only on ipv4, I don't like ugly function so I dropped it.
Maybe blocking using a string matching function over iptables is a more elegant solution.

(Last edited by alphasparc on 28 Jul 2015, 16:48)

alphasparc wrote:

Just to take note there is no crypto acceleration in WDR4900v1 only Assembler Optimization with my patch.
The crypto caam device is not initialised and the openssl is not build with dev/crypto to enable it.

I just took a look at the Chip
Part No. of the Chip is P1014NSN5HFA
From the datasheet
P = 45nm
1 = Platform
01 = Single Core
4 = Derivative
N = Commercial TieN = Qual’d to Industrial Tier
S = Std  Temp
N = SEC Not  Present
5 = TEPBGA-1 Pb free
H = 800 MHz
F = 667 MHz
A = Rev 1.0

So yea NO HW ENCRYPTION for this chip

(Last edited by alphasparc on 7 Aug 2015, 04:53)

The 4.1 kernel introduces some PPC SPE optimized crypto modules that can be used through cryptodev, however in my results it actually makes OpenVPN slower. You can see the results on the last few pages of my thread if you are interested.

Here's the patch to enable the new modules:

diff --git a/package/kernel/linux/modules/crypto.mk b/package/kernel/linux/modules/crypto.mk
index 84e5147..ce7d88c 100644
--- a/package/kernel/linux/modules/crypto.mk
+++ b/package/kernel/linux/modules/crypto.mk
@@ -224,6 +224,29 @@ endef
 
 $(eval $(call KernelPackage,crypto-hw-ppc4xx))
 
+define KernelPackage/crypto-ppc-spe
+  TITLE:=PPC SPE crypto modules
+  DEPENDS:=@TARGET_mpc85xx
+  KCONFIG:= \
+       CONFIG_CRYPTO_SHA1_PPC_SPE \
+       CONFIG_CRYPTO_SHA256_PPC_SPE \
+       CONFIG_CRYPTO_AES_PPC_SPE \
+       CONFIG_CRYPTO_MD5_PPC
+  FILES:= \
+       $(LINUX_DIR)/arch/powerpc/crypto/aes-ppc-spe.ko \
+       $(LINUX_DIR)/arch/powerpc/crypto/sha1-ppc-spe.ko \
+       $(LINUX_DIR)/arch/powerpc/crypto/sha256-ppc-spe.ko \
+       $(LINUX_DIR)/arch/powerpc/crypto/md5-ppc.ko
+  AUTOLOAD:=$(call AutoLoad,90,aes-ppc-spe sha1-ppc-spe sha256-ppc-spe md5-ppc)
+  $(call AddDepends/crypto,+kmod-crypto-hash)
+endef
+
+define KernelPackage/crypto-ppc-spe/description
+  Crypto modules implemented using PowerPC SPE SIMD instruction set.
+endef
+
+$(eval $(call KernelPackage,crypto-ppc-spe))
+
 
 define KernelPackage/crypto-hw-omap
   TITLE:=TI OMAP hardware crypto modules

They speed up kernel IPsec a lot.

Yep I saw that patch but I quite sure building the kernel requires SPE explictly disabled otherwise it will not boot.

I thought the need to use cryptodev is only required if there is a hardware crypto engine that cannot be accessed directly by the kernel?
If the SPE optimization is already in the kernel there isn't a need for cryptodev anymore.

(Last edited by alphasparc on 7 Aug 2015, 06:46)

Actually, that's incorrect. Kernel and userspace are different things.

# zcat /proc/config.gz | grep SPE
CONFIG_SPE_POSSIBLE=y
CONFIG_SPE=y

Cryptodev lets userland use kernel crypto, hardware or not. The OpenSSL library for example, has it's own AES crypto. Instead you can let it use cryptodev and thus get PPC SPE acceleration. Here's my speed scores with cryptodev:

| r46541 | 1.0.2d | 108506760 | 69118600 | 38479140 | 22166140 | 15238290 | 5536860 | 32215110 | 28374450 | 25315820 | 18.5 | 628.6 67.8 | 55.7 |

Like I said though, it doesn't translate into more speed for OpenVPN so I haven't found it useful. Just interesting smile

I'd like to use your build on my TP-Link TL-WDR4300 v1 (OpenWrt Barrier Breaker r40509 / LuCI Trunk (svn-r9978)) with the hope to increase my internet speed.

I didn't play around with flashing my router with OpenWRT for a while, could you please let me know whether I should use this build below, so I don't mess up anything smile

https://github.com/gwlim/Openwrt_Firmwa … pgrade.bin

Or that one?
https://github.com/gwlim/Openwrt_Firmwa … pgrade.bin

I'm going to upgrade the router using Luci interface "Flash new firmware image" and keep all settings.

Thanks

I figured out that I need to use the first option which is for international routers. Does that firmware overclock the router? Will I get better performance using your optimized build vs CC?

visata wrote:

I figured out that I need to use the first option which is for international routers. Does that firmware overclock the router? Will I get better performance using your optimized build vs CC?

I went directly from CC to his August build on wdr4300, and I'd say the BB has better performance, but the advantages of CC are definitely in the software/firmware flexibility. Although you could port a few patches over to CC, to optimize the build like is right now, especially if you're overclocked (btw I had to try 3 of his boot-loaders before I found one that worked properly, so def. test it out), but personally couldn't build a CC as optimized for performance as the current BB one is, although it's definitely possible. I would maybe try using arokh's config as a base, with a lot of packages removed (b/c the build's to large for wdr4300), and then see what relevant patches from this build could be fit into arokh's, but it'd be a minimalistic hybrid of both.

But I'm just gonna wait for alphasparc b/c I have no idea what I'm doing tongue

December build is out.
Removed the interlink mips16 (originally required to jump from non-mips16 code to mips16 code but if you remove mips16 from it you can remove it completely)

[RANT]
mips16 is really an abomination.
-Does not save any space for uclibc
-Slows down performance by a ton (tested)

Why the hell was it even invented?

A lot of you guys asked why no chaos calmer?
I have a working refresh patch for chaos calmer but I tested there is very little improvement in performance from my Barrier Breaker builds
So there is no point in it anymore.
-I do not subscribe to the view of getting new routers (heard of ewaste? wonder where all your disposed electronics goes?)
-If something runs fast on weak devices it will fly on strong devices

Openwrt is getting heavier and heavier each release, not friendly with Weaker Processors anymore.
-I wonder if the developers actually test the release on their own devices at all.

Why is this happening?
Because people are not doing testing at all.
Chao Calmer does not automatically means it is better, it is just Barrier Breaker plus patches not backported.
More features != Better, only when you need it!

Internet is getting faster so either you increase the performance of the software to keep up or you have to discard the device to get better ones.
But if you are discarding them, then why bothering supporting them in OpenWRT?
[/RANT]

(Last edited by alphasparc on 27 Nov 2015, 06:38)

It's not about saving disk space: https://imgtec.com/mips/architectures/ase16e/

Better to seek information than bash something you don't know about. My WNDR3700 has better OpenSSL scores with current DD than it ever did with AA. With MIPS16 turned on. I know there were some packages (dropbear for example) that were dog slow when compiled with MIPS16 but stuff like that have been ironed out by fixing code or turning it off for problematic packages. Have you created any tickets?

The performance improved because of other stuff not because of ASE, and I submit no tickets? LOL
My current optimized BB build yield the same scores or higher.



You're not seriously complaining about a distribution that fits on 4MB is getting "heavy"? What exactly is it that you feel is not tested good enough? And why are you complaining instead of helping? Keep in mind the amount of routers that are supported and how much you pay for that work.

It is about the increase in running heavy processes in the base software and listening sockets, you haven't notice have you?



So does that mean Windows 10 is simply Windows 8 plus patches not backported as well? Software development is not only about features, it's also about fixing bugs and improving security.

The core function of a router is to route, if it can't route fast enough for your internet connection nobody is gonna use it.
You can have the most secure slow pc but what is the point if you can't do useful things with it?


How do you figure that? Routers follow Moore's law same as everything else, have you seen the specs on current routers? Dual core 1GHz+ with 256/512MB of RAM is common now. Anyways, OpenWrt still runs great on older devices. It's just physically impossible for them to push 300+ Mbps using software NAT, compiler optimization or not. You can do a whole bunch of other useful stuff with them using current software though.

This is bullsh!t, try using one of my bb build. I have been optimizing the performance solely for this purpose.
Which is why I am complaining because the base system is starting to use up so much performance that any form of optimization is not increasing the performance anymore.

(Last edited by alphasparc on 28 Nov 2015, 00:55)

arokh wrote:

You mind communicating like an adult? If you read again, I was replying to your comment about MIPS16 being an "abomination" and only about saving disk space. I didn't say you submitted no tickets, I _asked_ if you did. Like, did you report problematic packages or the likes, because it seems to run fine now. For performance I too would turn it off though, but you have that option so I'm not sure what you are complaining about?

MIPS16 was previously not enabled at all, then suddenly the developers enabled it claiming a boost in performance which I have not been able to reproduce.
So the point is why enable a feature which is useless in the first place?

No I haven't, the only load is soft IRQ's when doing throughput testing on DD as far as I can tell. I remember there was an issue with odhcpd which has been fixed long ago.

Yes and I was the person who reported it because I noticed and even then there was denial of my observation, my complains were that the is insufficient empirical testing to ensure performance of software.
SIRQ is normal because interrupt is required for packet processing.
I am talking about all the procd, uci listening on sockets unnecessarily.



It used to be, today routers have two USB ports for print and file sharing, VPN setups etc. In any case, the core of OpenWrt still only does routing and firewalling. If your router is not fast enough you have the option to buy a faster one. If you're worried about e-waste simply donate your old one to a school.

I don't think you understand the problem, it is about maximizing performance on limited hardware to add value to the software.
And you think a school doesn't have its own network...





Seems you are referring to your own results as "bullsh!t" then:


There is no way the WR1043NDv1 is going to break that limit any time soon, even if you slap Linux 1.0 on it and 0 processes. Hardware has a fixed limit, simple as that.

My WDR4900v1 is doing the same speed (actually slightly lower because I use 4.1 kernel) on DD with none of your patches and tons of extra processes running. So again, what exactly is it you are complaining about? I can't see anything constructive about your post, when you consider the devs are doing a hell of a job for free and you can contribute whatever you like.

Wrong again.
1)My patches mainly focus on the MIPS architecture, PowerPC is just a recent addition. Compared to standard builds there is a mile of performance difference.
2)Just for the information WR1043ND could exceed 300Mbps on AA, on BB it fell and I had to do more research and testing for optimization to actually bring up the performance again. I have been keeping tabs on performance all the way from AA to CC and it is a consistent drop. I have been applying the same patch to CC with less results from adding it to BB.
Currently CC can only do 350Mbps on WDR4300.
3)My PowerPC is now at 600Mbps
4)There is no Hardware Limit to a software processed Filter Table, it is how you remove the cruff so the important processes can get more CPU time, and optimize the ISA and code
5)My constructive input is to bring to attention about performance, there will be no change if no one actually thinks there is a problem, while people like you keep insisting everything is fine.

(Last edited by alphasparc on 28 Nov 2015, 11:27)