OpenWrt Forum Archive

Topic: Compiler Optimization Tweaks

The content of this topic has been archived between 27 Apr 2018 and 29 Apr 2018. There are no obvious gaps in this topic, but there may still be some posts missing at the end.

I read about this
http://gcc.gnu.org/onlinedocs/gcc/MIPS-Options.html

-mtune=arch
    Optimize for arch. Among other things, this option controls the way instructions are scheduled, and the perceived cost of arithmetic operations. The list of arch values is the same as for -march.

    When this option is not used, GCC will optimize for the processor specified by -march. By using -march and -mtune together, it is possible to generate code that will run on a family of processors, but optimize the code for one particular member of that family.

    `-mtune' defines the macros `_MIPS_TUNE' and `_MIPS_TUNE_foo', which work in the same way as the `-march' ones described above.

This is the gist of the thread:
By using -march and -mtune together, it is possible to generate code that will run on a family of processors, but optimize the code for one particular member of that family.

You check what feature is supported by your SoC by consoling into your router and issuing the following command:

root@OpenWrt:/# cat /proc/cpuinfo
system type             : Ralink RT3052   id:1 rev:3
machine                 : Aztech HW550-3G
processor               : 0
cpu model               : MIPS 24KEc V4.12
BogoMIPS                : 255.59
wait instruction        : yes
microsecond timers      : yes
tlb_entries             : 32
extra interrupt vector  : yes
hardware watchpoint     : yes, count: 4, address/irw mask: [0x0000, 0x0740, 0x0100, 0x00c0]
ASEs implemented        : mips16 dsp
shadow register sets    : 1
kscratch registers      : 0
core                    : 0
VCED exceptions         : not available
VCEI exceptions         : not available

Go to
http://gcc.gnu.org/onlinedocs/gcc/MIPS-Options.html
See which -march flag matches the closest to the SoC you have.
The above example would be '-mtune 24kec' and you can add '-mdsp' to optimize for dsp (Digital Signal Processing)
-------------------------- In Theory Not Tested -----------------------------------------------
Some SoC have 'mt'(Multithreading) in the ASE.
http://www.mips.com/products/architectures/mips-mt-ase/
You can add -mmt to the flag
You can also do 'make kernel_menuconfig' and enable multithreading in kernel config
---------------------------------------------------------------------------------------------------

I have been using modified -mtune flags so far no issues regarding stability is observed.

So I compiled 2 images 1 with the normal -mtune=mips32r2 and 1 with the -mtune=24kc (For the TP-Link WR1043ND)
First I ran md5checksum to confirm that different code was emitted and sure enough the checksums were different.
Next I ran the openssl benchmark

-mtune=mip32r2 | 1.0.0g |  21405770 |  6854610 |  4664930 |  2842120 |  2909780 |  1034830 |  4920130 |  4270740 |  3777190 |  3.5 |  121.4 |  12.2 |  10.1 |
-mtune=24kc      | 1.0.0g |  21521180 |  6969730 |  4764010 |  2889050 |  3114950 |  1105850 |  5409050 |  4683250 |  4176140 |  4.4 |  153.3 |  15.4 |  12.6 |

Anyone wants to confirm my results?
To verify the difference accurately
-Keep all other variables constant like: Trunk Version, Packages Selected, Default Configuation
-Change only the CFLAGs

(Last edited by alphasparc on 16 Apr 2012, 14:51)

Hello, I would like to test. Can you tell me please explain briefly what I have to do?

Make menuconfig
..
?

Make

This is for TP-Link WR1043ND and similar router with the MIPS24Kc Processor only

Edit the make file in the target
/trunk/target/linux/ar71xx/Makefile <- This file

The line below
CFLAGS:=-Os -pipe -mips32r2 -mtune=mips32r2 -fno-caller-saves
Change to
CFLAGS:=-Os -pipe -mips32r2 -mtune=24kc -fno-caller-saves

Then make dirclean
Then make V=99

(Last edited by alphasparc on 13 Mar 2012, 02:19)

oh i have a Buffalo Router show sig. with MIPS 34Kc / 333MHz  --> http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h?s[]=wbmr

My router is in lantiq path

RCH:=mips
BOARD:=lantiq
BOARDNAME:=Lantiq GPON/XWAY
FEATURES:=squashfs jffs2
DEFAULT_SUBTARGET:=danube

LINUX_VERSION:=3.1.10

CFLAGS=-Os -pipe -mips32r2 -mtune=mips32r2 -fno-caller-saves

the CFLAGS are the same.
what du you mean is it not good for me?

darkwin wrote:

oh i have a Buffalo Router show sig. with MIPS 34Kc / 333MHz  --> http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h?s[]=wbmr

My router is in lantiq path

RCH:=mips
BOARD:=lantiq
BOARDNAME:=Lantiq GPON/XWAY
FEATURES:=squashfs jffs2
DEFAULT_SUBTARGET:=danube

LINUX_VERSION:=3.1.10

CFLAGS=-Os -pipe -mips32r2 -mtune=mips32r2 -fno-caller-saves

the CFLAGS are the same.
what du you mean is it not good for me?

The TP-Link has a MIPS24Kc Processor so according to the GCC Doc I can put -mtune=mips24kc
Your Processor is different so you have to find the corresponding GCC Supported CFLAG

According to http://gcc.gnu.org/onlinedocs/gcc/MIPS-Options.html section "-march=arch":

"Generate code that will run on arch, which can be the name of a generic MIPS ISA, or the name of a particular processor. The ISA names are: `mips1', `mips2', `mips3', `mips4', `mips32', `mips32r2', `mips64' and `mips64r2'. The processor names are: `4kc', `4km', `4kp', `4ksc', `4kec', `4kem', `4kep', `4ksd', `5kc', `5kf', `20kc', `24kc', `24kf2_1', `24kf1_1', `24kec', `24kef2_1', `24kef1_1', `34kc', `34kf2_1', `34kf1_1', `74kc', `74kf2_1', `74kf1_1', `74kf3_2', `1004kc', `1004kf2_1', `1004kf1_1', `loongson2e', `loongson2f', `loongson3a', `m4k', `octeon', `octeon+', `octeon2', `orion', `r2000', `r3000', `r3900', `r4000', `r4400', `r4600', `r4650', `r6000', `r8000', `rm7000', `rm9000', `r10000', `r12000', `r14000', `r16000', `sb1', `sr71000', `vr4100', `vr4111', `vr4120', `vr4130', `vr4300', `vr5000', `vr5400', `vr5500' and `xlr'. The special value `from-abi' selects the most compatible architecture for the selected ABI (that is, `mips1' for 32-bit ABIs and `mips3' for 64-bit ABIs). "

1. Now where is the difference between 24kc, 24kf2_1 and 24kf1_1 ?  Cf. CPUs

Edit: 2. How useful is the openssl-benchmark to determine processing power? Is it possible to have little differences in this benchmark and yet huge differences in performaing other tasks, like say routing? I recreated OpenSSL Benchmark with OpenWrt because I don't know any better. Are there alternatives to compare processing power of CPUs?

3. Since the created images here are target specific, aren't they already build with that option?

(Last edited by Orca on 8 Mar 2012, 15:57)

I don't foresee much improvement just trying to mess around to squeeze that tiny extra processing juice out of the CPU.
Basically I think -march  generate code that will run on a family of processors, but -mtune optimize the code for one particular member of that family.

Interesting... It looks like trunk development has improved performance in most tests over the last 13 months. See http://wiki.openwrt.org/inbox/benchmark.openssl?s[]=benchmark and compare build 25513, 13 months ago (I assume default compiler flags, mips32r2) to your results:

trunk r25513 WR1043ND | 1.0.0d |  21389570 |  6779290 |  5000360 |  2556530 |  3268580 |  1166810 |  4763150 |  4130930 |  3665850 |  3.3 |  113.0 |  11.3 |  9.0  |
your -mtune=mip32r2-- | 1.0.0g |  21405770 |  6854610 |  4664930 |  2842120 |  2909780 |  1034830 |  4920130 |  4270740 |  3777190 |  3.5 |  121.4 |  12.2 |  10.1 |
your -mtune=24kc ---- | 1.0.0g |  21521180 |  6969730 |  4764010 |  2889050 |  3114950 |  1105850 |  5409050 |  4683250 |  4176140 |  4.4 |  153.3 |  15.4 |  12.6 |

Unfortunately, no other information is given (like compiler version) and an older version of ssl (1.0.0d vs 1.0.0g) is used, but the numbers look like they are pretty comparable.

I will try the 24kc flag in my next build. My router is a WNDR3700v1, which also has some test results on the benchmark page.

(Last edited by avbohemen on 9 Mar 2012, 14:07)

darkwin wrote:

oh i have a Buffalo Router show sig. with MIPS 34Kc / 333MHz  --> http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h?s[]=wbmr

You can try -mtune=34kc .

orca wrote:

1. Now where is the difference between 24kc, 24kf2_1 and 24kf1_1 ?  Cf. CPUs

The f-variants have a hardware floating point unit. The 24kf2_1 has a half-clocked fpu. The 24kf1_1 has a 1:1 clocked fpu.

Thank you I will try it at my next build

This is my second test this time on a RT3052F Aztech HW550-3G Router:

-mtune mips32r2 | 1.0.0g |  23318870 |  6779780 |  4492590 |  2746530 |  2800300 |  992390 |  4740100 |  4123060 |  3625920 |  3.4 |  116.8 |  11.7 |  9.6 |
-mtune mips24kec -mdsp| 1.0.0g |  23291380 |  7740820 |  4338690 |  2221870 |  3590490 |  1299590 |  6141950 |  5358820 |  4772180 |  4.2 |  146.1 |  14.7 |  12.1 |

I think -mtune does really help especially at the RSA parts of the benchmarks.
Anyone else tried so far?

(Last edited by alphasparc on 13 Mar 2012, 02:41)

Here are my results, compared with a few benchmarks from the wiki:

wiki trunk - r26232- | 1.0.0d |  37445910 |  11610610 |  7947790 |  4891150 |  4916260 |  1774700 |  8031440 |  7067370 |  6395730 |  5.7 |  192.1 |  19.2 |  15.5 |
wiki trunk - r27984- | 1.0.0d |  36540250 |  11552240 |  7955350 |  4922860 |  4921730 |  1760040 |  8275440 |  7196270 |  6291260 |  5.6 |  191.5 |  19.2 |  15.9 |
mysys-30857-mips32r2 | 1.0.0g |  37191470 |  11748600 |  8004090 |  4862460 |  4967940 |  1773790 |  8394060 |  7272450 |  6444350 |  6.1 |  209.1 |  20.9 |  17.0 |
mysys-30909-mips24kc | 1.0.0g |  37290670 |  11891070 |  8070760 |  4896020 |  5278750 |  1886500 |  9166520 |  8014780 |  7127520 |  7.5 |  261.2 |  26.1 |  21.5 |

Improvements all the way, from +0.3% up to +25%. Definately worth my while.

(Last edited by avbohemen on 12 Mar 2012, 21:08)

Yea! Squeeze all the performance you can get out of these SoCs! lol

hi,

I tried it too, and it gives me about +10% with aes-encryption, so this may be helpful for everyone who runs openvpn. sadly blowfish doesn't benefit from the gcc-optimization at all and it's still faster than aes but all of you, who will sleep better by using aes, would really benefit using the optimization.

I tried changing -march instead of -mtune, and the performance is the same, maybe the filesize is smaller building only for 24kc instead of building for all mips32r2-cpus and only optimize for 24kc but it won't be much (image was the same size), and you have the risk that the firmware does't run if you choose the wrong processor-type. So it's probably not worth the risk of bricking your router. I tried it on my wndr3700 with a working tftp-srv in the bootloader for reflashing, I wouldn't try it on something were I have to start soldering if something went wrong.

Hi, i have tried this on my Buffalo WBMR-HP-G300H with openwrt trunk r30952 CFLAGS=-Os -pipe -mips32r2 -mtune=34kc -fno-caller-saves.

| trunk r30624 -mtune=mips32r2 | Atheros AR9 rev 1.2 | [[http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h|Buffalo WBMR-HP-G300H]] | MIPS 34Kc V4.12 | 221.18 | 1.0.0g |  17321240 |  5692940 |  3833600 |  2371060 |  2384510 |  858720 |  4028600 |  3525280 |  3116620 |  2.9 |  100.1 |  10.3 |  8.3 |
| trunk r30952 -mtune=34kc     | Atheros AR9 rev 1.2 | [[http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h|Buffalo WBMR-HP-G300H]] | MIPS 34Kc V4.12 | 221.18 | 1.0.0h |  15548440 |  5120720 |  3521190 |  2154730 |  2317510 |  855040 |  3953040 |  3508860 |  3138600 |  3.4 |  112.4 |  11.5 |  9.2 |

(Last edited by darkwin on 17 Mar 2012, 07:31)

darkwin wrote:

Hi, i have tried this on my Buffalo WBMR-HP-G300H with openwrt trunk r30952 CFLAGS=-Os -pipe -mips32r2 -mtune=34kc -fno-caller-saves.

| trunk r30624 -mtune=mips32r2 | Atheros AR9 rev 1.2 | [[http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h|Buffalo WBMR-HP-G300H]] | MIPS 34Kc V4.12 | 221.18 | 1.0.0g |  17321240 |  5692940 |  3833600 |  2371060 |  2384510 |  858720 |  4028600 |  3525280 |  3116620 |  2.9 |  100.1 |  10.3 |  8.3 |
| trunk r30952 -mtune=34kc     | Atheros AR9 rev 1.2 | [[http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h|Buffalo WBMR-HP-G300H]] | MIPS 34Kc V4.12 | 221.18 | 1.0.0h |  15548440 |  5120720 |  3521190 |  2154730 |  2317510 |  855040 |  3953040 |  3508860 |  3138600 |  3.4 |  112.4 |  11.5 |  9.2 |

Do you have serial ttl converter?
If you have you can try enable -mmt and enable multithreading in make kernel_menuconfig.
If you have serial then recovering it (if there is any error) is simple.
Then benchmark again.
I am really interested in the performance difference when multithreading is enabled.

(Last edited by alphasparc on 17 Mar 2012, 11:53)

Hi, yes i have a serial port smile. who can i find this option? Do you mean CFLAGS=-Os -pipe -mips32r2 -mtune=34kc -fno-caller-saves -mmt ?

I have a fast server for compile my images. Is this a good idea to enable this Option under Global build settings ? -->  Compile certain packages parallelized

ok i found the multi threading support. What is the right option?

(X) Disable multithreading support.                 
( ) Use 1 TC on each available VPE for SMP     
( ) SMTC: Use all TCs on all VPEs for SMP

and who can i find the mmt option?

darkwin wrote:

Hi, yes i have a serial port smile. who can i find this option? Do you mean CFLAGS=-Os -pipe -mips32r2 -mtune=34kc -fno-caller-saves -mmt ?

I have a fast server for compile my images. Is this a good idea to enable this Option under Global build settings ? -->  Compile certain packages parallelized

Yes.
After adding the flag don't built first.
Type 'make kernel_menuconfig'
Then find enable multithreading.
Enable then save then build.

(Last edited by alphasparc on 16 Apr 2012, 14:48)

Hello, I have repeated the test once again because I have done a mistake.
In the time as me has tested I had many processes in run and had high load.
Here the corrected test results:

     OS                          SoC                                      Device                                                     CPU         BogoMIPS OpenSSL Ver. MD5         SHA-1    SHA-256    SHA-512     DES       3DES        AES-128  AES-192    AES-256 RSA Sign   RSA Verify  DSA Sign  DSA Verify  
| trunk r30624            | Atheros AR9 rev 1.2 | [[http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h|Buffalo WBMR-HP-G300H]] | MIPS 34Kc V4.12 | 221.18 | 1.0.0g |  17321240 |  5692940 |  3833600 |  2371060 |  2384510 |  858720 |  4028600 |  3525280 |  3116620 |  2.9   |  100.1    |  10.3   |  8.3 |
| trunk r30952            | Atheros AR9 rev 1.2 | [[http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h|Buffalo WBMR-HP-G300H]] | MIPS 34Kc V4.12 | 221.18 | 1.0.0h |  16280900 |  5305860 |  3608650 |  2382960 |  2276650 |  804220 |  3800950 |  3333260 |  2939890 |  2.8   |  94.0     |  9.4    |  7.6 |
| trunk r30952 mtune_34kc | Atheros AR9 rev 1.2 | [[http://wiki.openwrt.org/toh/buffalo/wbmr-hp-g300h|Buffalo WBMR-HP-G300H]] | MIPS 34Kc V4.12 | 221.18 | 1.0.0h |  15770680 |  5301220 |  3676050 |  2247470 |  2386710 |  860300 |  4310480 |  3612700 |  3194020 |  3.4   |  118.1    |  11.8   |  9.5 |

As the next I test multi threading. However, I want to find out before still like I can do mine router flash via tftp.
A friend of me has the same Router and it does´t go with him.

(Last edited by darkwin on 18 Mar 2012, 08:54)

Are this tweak valid yet?

I've done some testing with this too. We can, at the cost of a somewhat higher memory consumption, even change from -Os to other more agressive optimizations:

Here is are my results for the TP-Link TL-WR1043ND:

-mtune=mip32r2 | 1.0.0g |  21405770 |  6854610 |  4664930 |  2842120 |  2909780 |  1034830 |  4920130 |  4270740 |  3777190 |  3.5 |  121.4 |  12.2 |  10.1 |
-mtune=24kc    | 1.0.0g |  21521180 |  6969730 |  4764010 |  2889050 |  3114950 |  1105850 |  5409050 |  4683250 |  4176140 |  4.4 |  153.3 |  15.4 |  12.6 |
trunk r31288   | 1.0.1  |  20927180 |  7354180 |  5393760 |  2393340 |  3450330 |  1221610 |  5551770 |  4827330 |  4285530 |  4.4 |  153.5 |  15.4 |  12.8 |
trunk r31316   | 1.0.1  |  20875860 |  7353560 |  5348350 |  2387740 |  3461670 |  1241560 |  5526860 |  4816590 |  4255250 |  4.4 |  153.1 |  15.3 |  12.7 |
trunk r31586   | 1.0.1b |  21328410 |  7028050 |  5500980 |  2607950 |  3429030 |  1249210 |  5451780 |  4794120 |  4186790 |  4.3 |  146.1 |  14.6 |  12.1 |
trunk r31639   | 1.0.1b |  21164050 |  7105800 |  5000820 |  2982440 |  3127670 |  1119530 |  5381990 |  4700130 |  4163810 |  4.4 |  152.2 |  15.3 |  12.4 |

Trunk r31288 to r31586 are build with options:

-Ofast -pipe -mips32r2 -march=24kc -mtune=24kc -fno-caller-saves

Trunk r31639 is build with:

-Os -pipe -mips32r2 -march=24kc -mtune=24kc -fno-caller-saves

(Last edited by StefanHamminga on 7 May 2012, 19:16)

Hi,
I did the same and seems to be no improvement. Tested yesterday.

Backfire (10.03.1, r29592) ----------- Linux OpenWrt 2.6.32.27 -mtune=mips32r2
| 0.9.8r |  37378730 |  11666090 |  8460290 |  4243800 |  5357230 |  1920680 |  8272550 |  7195650 |  6370990 |  6.7 |  245.1 |  24.9 |  20.9 |
Bleeding Edge, r32712 ---------------- Linux OpenWrt 3.3.8 -mtune=24kc -mmt
| 1.0.1c |  36958210 |  11898330 |  8114600 |  4920660 |  5305210 |  1889780 |  9155240 |  8003300 |  7091630 |  7.5 |  260.7 |  26.1 |  21.1 |

root@OpenWrt:~# cat /proc/cpuinfo
system type        : Atheros AR7161 rev 2
machine            : D-Link DIR-825 rev. B1
processor        : 0
cpu model        : MIPS 24Kc V7.4
BogoMIPS        : 452.19
wait instruction    : yes
microsecond timers    : yes
tlb_entries        : 16
extra interrupt vector    : yes
hardware watchpoint    : yes, count: 4, address/irw mask: [0x0000, 0x0ff8, 0x0ff8, 0x0ff8]
ASEs implemented    : mips16
shadow register sets    : 1
kscratch registers    : 0
core            : 0
VCED exceptions        : not available
VCEI exceptions        : not available

you are comparing diferent kernels.

what is -mmt for ?

glococo wrote:

Hi,
I did the same and seems to be no improvement. Tested yesterday.

Backfire (10.03.1, r29592) ----------- Linux OpenWrt 2.6.32.27 -mtune=mips32r2
| 0.9.8r |  37378730 |  11666090 |  8460290 |  4243800 |  5357230 |  1920680 |  8272550 |  7195650 |  6370990 |  6.7 |  245.1 |  24.9 |  20.9 |
Bleeding Edge, r32712 ---------------- Linux OpenWrt 3.3.8 -mtune=24kc -mmt
| 1.0.1c |  36958210 |  11898330 |  8114600 |  4920660 |  5305210 |  1889780 |  9155240 |  8003300 |  7091630 |  7.5 |  260.7 |  26.1 |  21.1 |

you have about 10% improvement on aes, I wouldn't say that's nothing