Thumb-2 code is denser than pure ARM, reducing RAM usage and improving
performance due to better instruction cache footprint.
There's no reason for not enabling this feature on other ARMv7 targets
(cortex-a7 and cortex-a8), but I don't have the hardware to test it.
Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
---
include/target.mk | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/target.mk b/include/target.mk
index a2ceb7f783..dfc6f4e480 100644
--- a/include/target.mk
+++ b/include/target.mk
@@ -196,6 +196,9 @@ ifeq ($(DUMP),1)
CPU_TYPE = sparc
CPU_CFLAGS_ultrasparc = -mcpu=ultrasparc
endif
+ ifeq ($(ARCH),arm)
+ CPU_CFLAGS_cortex-a9 = -mthumb
+ endif
ifeq ($(ARCH),aarch64)
CPU_TYPE ?= generic
CPU_CFLAGS_generic = -mcpu=generic
(Sending as RFC due to the note below.)
The Thumb-2 instruction set generates denser code, allowing for more efficient
use of the cache and consequently higher execution performance.
Vmlinux (uncompressed) size comparison for my personal configuration (Linux
5.4.46, compiled with gcc 9.3.0 and binutils 2.34):
Pure ARM:
24243392 bytes
Thumb-2:
22102716 bytes
NOTE: This requires enabling a linker bug workaround to avoid the emission of
R_ARM_THM_JUMP11 relocations [1] in modules, which the kernel doesn't support.
Since this effectively implies -fno-optimize-sibling-calls [2], we're generating
suboptimal code. While compat (and in-tree) modules load and run correctly
without this workaround, WireGuard fails to load with an unknown relocation 102
error.
[1] https://static.docs.arm.com/ihi0044/e/IHI0044E_aaelf.pdf (page 28)
[2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm/Makefile?h=linux-5.4.y#n129
Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
---
target/linux/mvebu/cortexa9/config-5.4 | 2 ++
1 file changed, 2 insertions(+)
create mode 100644 target/linux/mvebu/cortexa9/config-5.4
diff --git a/target/linux/mvebu/cortexa9/config-5.4 b/target/linux/mvebu/cortexa9/config-5.4
new file mode 100644
index 0000000000..6aff77fda7
--- /dev/null
+++ b/target/linux/mvebu/cortexa9/config-5.4
@@ -0,0 +1,2 @@
+CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=y
+CONFIG_THUMB2_KERNEL=y
IIRC the WG linker issue was resolved along the way.
On mvebu target, CONFIG_ARM_THUMB is enabled by default. Normally I don't have CONFIG_ARM_THUMB or CONFIG_THUMB2_KERNEL for my usual builds to save space. That's one of the reasons I wanted Thumb-2: the kernel image is just about to exceed the 4MB partition of my router.
I tried enabling kernel's CONFIG_ARM_THUMB and CONFIG_THUMB2_KERNEL and adding -mthumb in OpenWrt's CONFIG_TARGET_OPTIMIZATION. I made no other changes.
make[3] -C target/linux install
WARNING: Image file /build/openwrta/build_dir/target-arm_cortex-a9+vfpv3_musl_eabi/linux-mvebu_cortexa9/linksys_wrt1900ac-v1-kernel.bin is too big: 4599687 > 4194304
ERROR: target/linux failed to build.
The kernel got bigger! Not just a bit bigger; more than 400KB bigger!
It seems that removing CONFIG_ARM_THUMB from the default config saves more kernel code (compressed) than building in Thumb-2.
BTW, why is CONFIG_ARM_THUMB default enabled if the userspace is ordinary armv7?
It seems that I can't build userspace in thumb mode (-mthumb) and kernel in arm mode (CONFIG_THUMB2_KERNEL not set, but with thumb userspace support (CONFIG_ARM_THUMB set)).
make[8]: Entering directory '/build/openwrt.official/build_dir/target-arm_cortex-a9+vfpv3-d16_musl_eabi/linux-mvebu_cortexa9/linux-6.1.57'
CC arch/arm/vfp/vfpsingle.o
/build/openwrt.official/tmp/ccP9cnF1.s: Assembler messages:
/build/openwrt.official/tmp/ccP9cnF1.s:752: Error: thumb conditional instruction should be in IT block -- `movcc r2,r3'
/build/openwrt.official/tmp/ccP9cnF1.s:753: Error: thumb conditional instruction should be in IT block -- `orrcs r2,r3,#1'
make[8]: *** [scripts/Makefile.build:250: arch/arm/vfp/vfpsingle.o] Error 1
OpenWrt probably applies -mthumb to kernel too, while the kernel is not configured be in thumb mode.
Should I open a bug, or is this configuration too weird to support?