1 (edited by alphasparc 2012-04-16 14:51:53)

Topic: Compiler Optimization Tweaks

I read about this
http://gcc.gnu.org/onlinedocs/gcc/MIPS-Options.html

-mtune=arch
    Optimize for arch. Among other things, this option controls the way instructions are scheduled, and the perceived cost of arithmetic operations. The list of arch values is the same as for -march.

    When this option is not used, GCC will optimize for the processor specified by -march. By using -march and -mtune together, it is possible to generate code that will run on a family of processors, but optimize the code for one particular member of that family.

    `-mtune' defines the macros `_MIPS_TUNE' and `_MIPS_TUNE_foo', which work in the same way as the `-march' ones described above.

This is the gist of the thread:
By using -march and -mtune together, it is possible to generate code that will run on a family of processors, but optimize the code for one particular member of that family.

You check what feature is supported by your SoC by consoling into your router and issuing the following command:

root@OpenWrt:/# cat /proc/cpuinfo
system type             : Ralink RT3052   id:1 rev:3
machine                 : Aztech HW550-3G
processor               : 0
cpu model               : MIPS 24KEc V4.12
BogoMIPS                : 255.59
wait instruction        : yes
microsecond timers      : yes
tlb_entries             : 32
extra interrupt vector  : yes
hardware watchpoint     : yes, count: 4, address/irw mask: [0x0000, 0x0740, 0x0100, 0x00c0]
ASEs implemented        : mips16 dsp
shadow register sets    : 1
kscratch registers      : 0
core                    : 0
VCED exceptions         : not available
VCEI exceptions         : not available

Go to
http://gcc.gnu.org/onlinedocs/gcc/MIPS-Options.html
See which -march flag matches the closest to the SoC you have.
The above example would be '-mtune 24kec' and you can add '-mdsp' to optimize for dsp (Digital Signal Processing)
-------------------------- In Theory Not Tested -----------------------------------------------
Some SoC have 'mt'(Multithreading) in the ASE.
http://www.mips.com/products/architectures/mips-mt-ase/
You can add -mmt to the flag
You can also do 'make kernel_menuconfig' and enable multithreading in kernel config
---------------------------------------------------------------------------------------------------

I have been using modified -mtune flags so far no issues regarding stability is observed.

So I compiled 2 images 1 with the normal -mtune=mips32r2 and 1 with the -mtune=24kc (For the TP-Link WR1043ND)
First I ran md5checksum to confirm that different code was emitted and sure enough the checksums were different.
Next I ran the openssl benchmark

-mtune=mip32r2 | 1.0.0g |  21405770 |  6854610 |  4664930 |  2842120 |  2909780 |  1034830 |  4920130 |  4270740 |  3777190 |  3.5 |  121.4 |  12.2 |  10.1 |
-mtune=24kc      | 1.0.0g |  21521180 |  6969730 |  4764010 |  2889050 |  3114950 |  1105850 |  5409050 |  4683250 |  4176140 |  4.4 |  153.3 |  15.4 |  12.6 |

Anyone wants to confirm my results?
To verify the difference accurately
-Keep all other variables constant like: Trunk Version, Packages Selected, Default Configuation
-Change only the CFLAGs

Re: Compiler Optimization Tweaks

Hello, I would like to test. Can you tell me please explain briefly what I have to do?

Make menuconfig
..
?

Make

Buffalo WBMR-HP-G300H with openwrt trunk r31158 CFLAGS=-Os -pipe -mips32r2 -mtune=34kc -fno-caller-saves

3 (edited by alphasparc 2012-03-13 02:19:22)

Re: Compiler Optimization Tweaks

This is for TP-Link WR1043ND and similar router with the MIPS24Kc Processor only

Edit the make file in the target
/trunk/target/linux/ar71xx/Makefile <- This file

The line below
CFLAGS:=-Os -pipe -mips32r2 -mtune=mips32r2 -fno-caller-saves
Change to
CFLAGS:=-Os -pipe -mips32r2 -mtune=24kc -fno-caller-saves

Then make dirclean
Then make V=99

Re: Compiler Optimization Tweaks

oh i have a Buffalo Router show sig. with MIPS 34Kc / 333MHz  --> http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h?s[]=wbmr

My router is in lantiq path

RCH:=mips
BOARD:=lantiq
BOARDNAME:=Lantiq GPON/XWAY
FEATURES:=squashfs jffs2
DEFAULT_SUBTARGET:=danube

LINUX_VERSION:=3.1.10

CFLAGS=-Os -pipe -mips32r2 -mtune=mips32r2 -fno-caller-saves

the CFLAGS are the same.
what du you mean is it not good for me?

Buffalo WBMR-HP-G300H with openwrt trunk r31158 CFLAGS=-Os -pipe -mips32r2 -mtune=34kc -fno-caller-saves

Re: Compiler Optimization Tweaks

darkwin wrote:

oh i have a Buffalo Router show sig. with MIPS 34Kc / 333MHz  --> http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h?s[]=wbmr

My router is in lantiq path

RCH:=mips
BOARD:=lantiq
BOARDNAME:=Lantiq GPON/XWAY
FEATURES:=squashfs jffs2
DEFAULT_SUBTARGET:=danube

LINUX_VERSION:=3.1.10

CFLAGS=-Os -pipe -mips32r2 -mtune=mips32r2 -fno-caller-saves

the CFLAGS are the same.
what du you mean is it not good for me?

The TP-Link has a MIPS24Kc Processor so according to the GCC Doc I can put -mtune=mips24kc
Your Processor is different so you have to find the corresponding GCC Supported CFLAG

6 (edited by Orca 2012-03-08 15:57:05)

Re: Compiler Optimization Tweaks

According to http://gcc.gnu.org/onlinedocs/gcc/MIPS-Options.html section "-march=arch":

"Generate code that will run on arch, which can be the name of a generic MIPS ISA, or the name of a particular processor. The ISA names are: `mips1', `mips2', `mips3', `mips4', `mips32', `mips32r2', `mips64' and `mips64r2'. The processor names are: `4kc', `4km', `4kp', `4ksc', `4kec', `4kem', `4kep', `4ksd', `5kc', `5kf', `20kc', `24kc', `24kf2_1', `24kf1_1', `24kec', `24kef2_1', `24kef1_1', `34kc', `34kf2_1', `34kf1_1', `74kc', `74kf2_1', `74kf1_1', `74kf3_2', `1004kc', `1004kf2_1', `1004kf1_1', `loongson2e', `loongson2f', `loongson3a', `m4k', `octeon', `octeon+', `octeon2', `orion', `r2000', `r3000', `r3900', `r4000', `r4400', `r4600', `r4650', `r6000', `r8000', `rm7000', `rm9000', `r10000', `r12000', `r14000', `r16000', `sb1', `sr71000', `vr4100', `vr4111', `vr4120', `vr4130', `vr4300', `vr5000', `vr5400', `vr5500' and `xlr'. The special value `from-abi' selects the most compatible architecture for the selected ABI (that is, `mips1' for 32-bit ABIs and `mips3' for 64-bit ABIs). "

1. Now where is the difference between 24kc, 24kf2_1 and 24kf1_1 ?  Cf. CPUs

Edit: 2. How useful is the openssl-benchmark to determine processing power? Is it possible to have little differences in this benchmark and yet huge differences in performaing other tasks, like say routing? I recreated OpenSSL Benchmark with OpenWrt because I don't know any better. Are there alternatives to compare processing power of CPUs?

3. Since the created images here are target specific, aren't they already build with that option?

Re: Compiler Optimization Tweaks

I don't foresee much improvement just trying to mess around to squeeze that tiny extra processing juice out of the CPU.
Basically I think -march  generate code that will run on a family of processors, but -mtune optimize the code for one particular member of that family.

8 (edited by avbohemen 2012-03-09 14:07:36)

Re: Compiler Optimization Tweaks

Interesting... It looks like trunk development has improved performance in most tests over the last 13 months. See http://wiki.openwrt.org/inbox/benchmark.openssl?s[]=benchmark and compare build 25513, 13 months ago (I assume default compiler flags, mips32r2) to your results:

trunk r25513 WR1043ND | 1.0.0d |  21389570 |  6779290 |  5000360 |  2556530 |  3268580 |  1166810 |  4763150 |  4130930 |  3665850 |  3.3 |  113.0 |  11.3 |  9.0  |
your -mtune=mip32r2-- | 1.0.0g |  21405770 |  6854610 |  4664930 |  2842120 |  2909780 |  1034830 |  4920130 |  4270740 |  3777190 |  3.5 |  121.4 |  12.2 |  10.1 |
your -mtune=24kc ---- | 1.0.0g |  21521180 |  6969730 |  4764010 |  2889050 |  3114950 |  1105850 |  5409050 |  4683250 |  4176140 |  4.4 |  153.3 |  15.4 |  12.6 |

Unfortunately, no other information is given (like compiler version) and an older version of ssl (1.0.0d vs 1.0.0g) is used, but the numbers look like they are pretty comparable.

I will try the 24kc flag in my next build. My router is a WNDR3700v1, which also has some test results on the benchmark page.

Re: Compiler Optimization Tweaks

darkwin wrote:

oh i have a Buffalo Router show sig. with MIPS 34Kc / 333MHz  --> http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h?s[]=wbmr

You can try -mtune=34kc .

orca wrote:

1. Now where is the difference between 24kc, 24kf2_1 and 24kf1_1 ?  Cf. CPUs

The f-variants have a hardware floating point unit. The 24kf2_1 has a half-clocked fpu. The 24kf1_1 has a 1:1 clocked fpu.

Re: Compiler Optimization Tweaks

Thank you I will try it at my next build

Buffalo WBMR-HP-G300H with openwrt trunk r31158 CFLAGS=-Os -pipe -mips32r2 -mtune=34kc -fno-caller-saves

11 (edited by alphasparc 2012-03-13 02:41:41)

Re: Compiler Optimization Tweaks

This is my second test this time on a RT3052F Aztech HW550-3G Router:

-mtune mips32r2 | 1.0.0g |  23318870 |  6779780 |  4492590 |  2746530 |  2800300 |  992390 |  4740100 |  4123060 |  3625920 |  3.4 |  116.8 |  11.7 |  9.6 |
-mtune mips24kec -mdsp| 1.0.0g |  23291380 |  7740820 |  4338690 |  2221870 |  3590490 |  1299590 |  6141950 |  5358820 |  4772180 |  4.2 |  146.1 |  14.7 |  12.1 |

I think -mtune does really help especially at the RSA parts of the benchmarks.
Anyone else tried so far?

12 (edited by avbohemen 2012-03-12 21:08:05)

Re: Compiler Optimization Tweaks

Here are my results, compared with a few benchmarks from the wiki:

wiki trunk - r26232- | 1.0.0d |  37445910 |  11610610 |  7947790 |  4891150 |  4916260 |  1774700 |  8031440 |  7067370 |  6395730 |  5.7 |  192.1 |  19.2 |  15.5 |
wiki trunk - r27984- | 1.0.0d |  36540250 |  11552240 |  7955350 |  4922860 |  4921730 |  1760040 |  8275440 |  7196270 |  6291260 |  5.6 |  191.5 |  19.2 |  15.9 |
mysys-30857-mips32r2 | 1.0.0g |  37191470 |  11748600 |  8004090 |  4862460 |  4967940 |  1773790 |  8394060 |  7272450 |  6444350 |  6.1 |  209.1 |  20.9 |  17.0 |
mysys-30909-mips24kc | 1.0.0g |  37290670 |  11891070 |  8070760 |  4896020 |  5278750 |  1886500 |  9166520 |  8014780 |  7127520 |  7.5 |  261.2 |  26.1 |  21.5 |

Improvements all the way, from +0.3% up to +25%. Definately worth my while.

Re: Compiler Optimization Tweaks

Yea! Squeeze all the performance you can get out of these SoCs! lol

Re: Compiler Optimization Tweaks

hi,

I tried it too, and it gives me about +10% with aes-encryption, so this may be helpful for everyone who runs openvpn. sadly blowfish doesn't benefit from the gcc-optimization at all and it's still faster than aes but all of you, who will sleep better by using aes, would really benefit using the optimization.

I tried changing -march instead of -mtune, and the performance is the same, maybe the filesize is smaller building only for 24kc instead of building for all mips32r2-cpus and only optimize for 24kc but it won't be much (image was the same size), and you have the risk that the firmware does't run if you choose the wrong processor-type. So it's probably not worth the risk of bricking your router. I tried it on my wndr3700 with a working tftp-srv in the bootloader for reflashing, I wouldn't try it on something were I have to start soldering if something went wrong.

15 (edited by darkwin 2012-03-17 07:31:08)

Re: Compiler Optimization Tweaks

Hi, i have tried this on my Buffalo WBMR-HP-G300H with openwrt trunk r30952 CFLAGS=-Os -pipe -mips32r2 -mtune=34kc -fno-caller-saves.

| trunk r30624 -mtune=mips32r2 | Atheros AR9 rev 1.2 | [[http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h|Buffalo WBMR-HP-G300H]] | MIPS 34Kc V4.12 | 221.18 | 1.0.0g |  17321240 |  5692940 |  3833600 |  2371060 |  2384510 |  858720 |  4028600 |  3525280 |  3116620 |  2.9 |  100.1 |  10.3 |  8.3 |
| trunk r30952 -mtune=34kc     | Atheros AR9 rev 1.2 | [[http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h|Buffalo WBMR-HP-G300H]] | MIPS 34Kc V4.12 | 221.18 | 1.0.0h |  15548440 |  5120720 |  3521190 |  2154730 |  2317510 |  855040 |  3953040 |  3508860 |  3138600 |  3.4 |  112.4 |  11.5 |  9.2 |
Buffalo WBMR-HP-G300H with openwrt trunk r31158 CFLAGS=-Os -pipe -mips32r2 -mtune=34kc -fno-caller-saves

16 (edited by alphasparc 2012-03-17 11:53:33)

Re: Compiler Optimization Tweaks

darkwin wrote:

Hi, i have tried this on my Buffalo WBMR-HP-G300H with openwrt trunk r30952 CFLAGS=-Os -pipe -mips32r2 -mtune=34kc -fno-caller-saves.

| trunk r30624 -mtune=mips32r2 | Atheros AR9 rev 1.2 | [[http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h|Buffalo WBMR-HP-G300H]] | MIPS 34Kc V4.12 | 221.18 | 1.0.0g |  17321240 |  5692940 |  3833600 |  2371060 |  2384510 |  858720 |  4028600 |  3525280 |  3116620 |  2.9 |  100.1 |  10.3 |  8.3 |
| trunk r30952 -mtune=34kc     | Atheros AR9 rev 1.2 | [[http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h|Buffalo WBMR-HP-G300H]] | MIPS 34Kc V4.12 | 221.18 | 1.0.0h |  15548440 |  5120720 |  3521190 |  2154730 |  2317510 |  855040 |  3953040 |  3508860 |  3138600 |  3.4 |  112.4 |  11.5 |  9.2 |

Do you have serial ttl converter?
If you have you can try enable -mmt and enable multithreading in make kernel_menuconfig.
If you have serial then recovering it (if there is any error) is simple.
Then benchmark again.
I am really interested in the performance difference when multithreading is enabled.

Re: Compiler Optimization Tweaks

Hi, yes i have a serial port smile. who can i find this option? Do you mean CFLAGS=-Os -pipe -mips32r2 -mtune=34kc -fno-caller-saves -mmt ?

I have a fast server for compile my images. Is this a good idea to enable this Option under Global build settings ? -->  Compile certain packages parallelized

Buffalo WBMR-HP-G300H with openwrt trunk r31158 CFLAGS=-Os -pipe -mips32r2 -mtune=34kc -fno-caller-saves

Re: Compiler Optimization Tweaks

ok i found the multi threading support. What is the right option?

(X) Disable multithreading support.                 
( ) Use 1 TC on each available VPE for SMP     
( ) SMTC: Use all TCs on all VPEs for SMP

and who can i find the mmt option?

Buffalo WBMR-HP-G300H with openwrt trunk r31158 CFLAGS=-Os -pipe -mips32r2 -mtune=34kc -fno-caller-saves

19 (edited by alphasparc 2012-04-16 14:48:26)

Re: Compiler Optimization Tweaks

darkwin wrote:

Hi, yes i have a serial port smile. who can i find this option? Do you mean CFLAGS=-Os -pipe -mips32r2 -mtune=34kc -fno-caller-saves -mmt ?

I have a fast server for compile my images. Is this a good idea to enable this Option under Global build settings ? -->  Compile certain packages parallelized

Yes.
After adding the flag don't built first.
Type 'make kernel_menuconfig'
Then find enable multithreading.
Enable then save then build.

20 (edited by darkwin 2012-03-18 08:54:46)

Re: Compiler Optimization Tweaks

Hello, I have repeated the test once again because I have done a mistake.
In the time as me has tested I had many processes in run and had high load.
Here the corrected test results:

     OS                          SoC                                      Device                                                     CPU         BogoMIPS OpenSSL Ver. MD5         SHA-1    SHA-256    SHA-512     DES       3DES        AES-128  AES-192    AES-256 RSA Sign   RSA Verify  DSA Sign  DSA Verify  
| trunk r30624            | Atheros AR9 rev 1.2 | [[http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h|Buffalo WBMR-HP-G300H]] | MIPS 34Kc V4.12 | 221.18 | 1.0.0g |  17321240 |  5692940 |  3833600 |  2371060 |  2384510 |  858720 |  4028600 |  3525280 |  3116620 |  2.9   |  100.1    |  10.3   |  8.3 |
| trunk r30952            | Atheros AR9 rev 1.2 | [[http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h|Buffalo WBMR-HP-G300H]] | MIPS 34Kc V4.12 | 221.18 | 1.0.0h |  16280900 |  5305860 |  3608650 |  2382960 |  2276650 |  804220 |  3800950 |  3333260 |  2939890 |  2.8   |  94.0     |  9.4    |  7.6 |
| trunk r30952 mtune_34kc | Atheros AR9 rev 1.2 | [[http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h|Buffalo WBMR-HP-G300H]] | MIPS 34Kc V4.12 | 221.18 | 1.0.0h |  15770680 |  5301220 |  3676050 |  2247470 |  2386710 |  860300 |  4310480 |  3612700 |  3194020 |  3.4   |  118.1    |  11.8   |  9.5 |

As the next I test multi threading. However, I want to find out before still like I can do mine router flash via tftp.
A friend of me has the same Router and it does´t go with him.

Buffalo WBMR-HP-G300H with openwrt trunk r31158 CFLAGS=-Os -pipe -mips32r2 -mtune=34kc -fno-caller-saves

Re: Compiler Optimization Tweaks

Are this tweak valid yet?

22 (edited by StefanHamminga 2012-05-07 19:16:04)

Re: Compiler Optimization Tweaks

I've done some testing with this too. We can, at the cost of a somewhat higher memory consumption, even change from -Os to other more agressive optimizations:

Here is are my results for the TP-Link TL-WR1043ND:

-mtune=mip32r2 | 1.0.0g |  21405770 |  6854610 |  4664930 |  2842120 |  2909780 |  1034830 |  4920130 |  4270740 |  3777190 |  3.5 |  121.4 |  12.2 |  10.1 |
-mtune=24kc    | 1.0.0g |  21521180 |  6969730 |  4764010 |  2889050 |  3114950 |  1105850 |  5409050 |  4683250 |  4176140 |  4.4 |  153.3 |  15.4 |  12.6 |
trunk r31288   | 1.0.1  |  20927180 |  7354180 |  5393760 |  2393340 |  3450330 |  1221610 |  5551770 |  4827330 |  4285530 |  4.4 |  153.5 |  15.4 |  12.8 |
trunk r31316   | 1.0.1  |  20875860 |  7353560 |  5348350 |  2387740 |  3461670 |  1241560 |  5526860 |  4816590 |  4255250 |  4.4 |  153.1 |  15.3 |  12.7 |
trunk r31586   | 1.0.1b |  21328410 |  7028050 |  5500980 |  2607950 |  3429030 |  1249210 |  5451780 |  4794120 |  4186790 |  4.3 |  146.1 |  14.6 |  12.1 |
trunk r31639   | 1.0.1b |  21164050 |  7105800 |  5000820 |  2982440 |  3127670 |  1119530 |  5381990 |  4700130 |  4163810 |  4.4 |  152.2 |  15.3 |  12.4 |

Trunk r31288 to r31586 are build with options:

-Ofast -pipe -mips32r2 -march=24kc -mtune=24kc -fno-caller-saves

Trunk r31639 is build with:

-Os -pipe -mips32r2 -march=24kc -mtune=24kc -fno-caller-saves

Re: Compiler Optimization Tweaks

Hi,
I did the same and seems to be no improvement. Tested yesterday.

Backfire (10.03.1, r29592) ----------- Linux OpenWrt 2.6.32.27 -mtune=mips32r2
| 0.9.8r |  37378730 |  11666090 |  8460290 |  4243800 |  5357230 |  1920680 |  8272550 |  7195650 |  6370990 |  6.7 |  245.1 |  24.9 |  20.9 |
Bleeding Edge, r32712 ---------------- Linux OpenWrt 3.3.8 -mtune=24kc -mmt
| 1.0.1c |  36958210 |  11898330 |  8114600 |  4920660 |  5305210 |  1889780 |  9155240 |  8003300 |  7091630 |  7.5 |  260.7 |  26.1 |  21.1 |

root@OpenWrt:~# cat /proc/cpuinfo
system type        : Atheros AR7161 rev 2
machine            : D-Link DIR-825 rev. B1
processor        : 0
cpu model        : MIPS 24Kc V7.4
BogoMIPS        : 452.19
wait instruction    : yes
microsecond timers    : yes
tlb_entries        : 16
extra interrupt vector    : yes
hardware watchpoint    : yes, count: 4, address/irw mask: [0x0000, 0x0ff8, 0x0ff8, 0x0ff8]
ASEs implemented    : mips16
shadow register sets    : 1
kscratch registers    : 0
core            : 0
VCED exceptions        : not available
VCEI exceptions        : not available

Re: Compiler Optimization Tweaks

you are comparing diferent kernels.

what is -mmt for ?

Re: Compiler Optimization Tweaks

glococo wrote:

Hi,
I did the same and seems to be no improvement. Tested yesterday.

Backfire (10.03.1, r29592) ----------- Linux OpenWrt 2.6.32.27 -mtune=mips32r2
| 0.9.8r |  37378730 |  11666090 |  8460290 |  4243800 |  5357230 |  1920680 |  8272550 |  7195650 |  6370990 |  6.7 |  245.1 |  24.9 |  20.9 |
Bleeding Edge, r32712 ---------------- Linux OpenWrt 3.3.8 -mtune=24kc -mmt
| 1.0.1c |  36958210 |  11898330 |  8114600 |  4920660 |  5305210 |  1889780 |  9155240 |  8003300 |  7091630 |  7.5 |  260.7 |  26.1 |  21.1 |

you have about 10% improvement on aes, I wouldn't say that's nothing