Kernel panic with kexec on ramips

Hi,

i'm trying to use kexec to load a kernel with a initrd, the objective is to load a kernel from a sdcard on a Onion Omega 2p.

Right now, for testing, i tried copying a kernel on to the device then using kexec, i also tried loading from /tmp/.

after launching the command this is the output:

root@(none):/overlay# kexec -d -l /overlay/vmllinux.elf
Try gzip decompression.
Try LZMA decompression.
lzma_decompress_file: read on /overlay/vmllinux.elf of 65536 bytes failed
kernel: 0x77287010 kernel_size: 0xbae724
Modified cmdline: 
Unable to find /proc/device-tree/chosen/[linux,]stdout-path, printing from purgatory is disabled
kexec_load: entr[ 1006.490365] kexec command line truncated to 256 bytes
y = 0x39baa0 fla[ 1006.496700] usercopy: kernel memory overwrite attempt detected to 80011f64 (<kernel text>) (256 bytes)
gs = 0x80000
nr_[ 1006.507551] Kernel bug detected[#1]:
[ 1006.512546] CPU: 0 PID: 1088 Comm: kexec Not tainted 4.14.149 #0
[ 1006.518636] task: 87d53180 task.stack: 87c2e000
[ 1006.523223] $ 0   : 00000000 00000000 0000005a 8048c394
[ 1006.528529] $ 4   : 80490000 00003270 00000000 805e7d98
[ 1006.533831] $ 8   : 00000000 000000af 00000007 00000000
[ 1006.539134] $12   : 00000000 00000000 0007943c ffffffff
[ 1006.544436] $16   : 80011f64 00000100 00000000 80012064
[ 1006.549739] $20   : 80010000 80010000 00000000 00424eb0
[ 1006.555042] $24   : 00000001 8020c890                  
[ 1006.560346] $28   : 87c2e000 87c2fe78 804a0000 800dee94
[ 1006.565651] Hi    : 000000c0
[ 1006.568567] Lo    : ec4e4000
[ 1006.571502] epc   : 800dee94 __check_object_size+0x1b4/0x1e0
[ 1006.577240] ra    : 800dee94 __check_object_size+0x1b4/0x1e0
[ 1006.582970] Status: 1100e403	KERNEL EXL IE 
[ 1006.587220] Cause : 10800024 (ExcCode 09)
[ 1006.591282] PrId  : 00019655 (MIPS 24KEc)
[ 1006.595340] Modules linked in: pppoe ppp_async option usb_wwan qmi_wwan pppox ppp_generic nf_conntrack_ipv6 mt76x2e mt76x2_common mt76x02_lib mt7603e mt76 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD usbserial usbnet slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat cdc_wdm nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 vfat fat nls_utf8 nls_iso8859_1 nls_cp437 mmc_block mtk_sd mmc_core
[ 1006.667074]  leds_gpio ohci_platform ohci_hcd ehci_platform ehci_hcd gpio_button_hotplug ext4 mbcache jbd2 usbcore nls_base usb_common crc16 mii crc32c_generic crypto_hash
[ 1006.682644] Process kexec (pid: 1088, threadinfo=87c2e000, task=87d53180, tls=77f4eec8)
[ 1006.690750] Stack : 8043091c 8043b094 80443940 80011f64 8043b064 00000100 87088600 0041e02c
[ 1006.699236]         00000100 80011f4c 87088600 80011a18 87428000 00000100 87088630 80430914
[ 1006.707719]         87088600 87088600 87088620 00000000 00000030 00000000 00000003 8007d348
[ 1006.716200]         77246000 87e8fdc0 7fd1e120 87e8fdc0 00000000 80608cf8 00000010 87295d80
[ 1006.724682]         5ddb8245 20c855b7 77f45020 0041e340 7fd1e644 00401151 77f22be4 77f21878
[ 1006.733165]         ...
[ 1006.735645] Call Trace:
[ 1006.738128] [<800dee94>] __check_object_size+0x1b4/0x1e0
[ 1006.743530] [<80011a18>] machine_kexec_prepare+0x128/0x2cc
[ 1006.749095] [<8007d348>] SyS_kexec_load+0x278/0x3c0
[ 1006.754043] [<8001042c>] syscall_common+0x34/0x58
[ 1006.758814] Code: 02003825  0c015824  2484b0ac <000c000d> 8fbf002c  8fb30028  8fb20024  8fb10020  8fb0001c 
[ 1006.768718] 
segments = 3
seg[ 1006.770281] ---[ end trace 088288e9562225d9 ]---
ment[0].buf   = 0x77288010
segme[ 1006.778597] Kernel panic - not syncing: Fatal exception
[ 1006.786218] Rebooting in 3 seconds..

I've already used kexec on other machine (x86) with success, but i'm no expert, i'm not sure what to do next to troubleshoot this problem.

Thanks in advance

I am no expert either, but as I can see, you get decompression failed. Probably because you supply kernel in a different format (ELF?), than is expected by kexec. So you either need to choose correct format with kexec options ( kexec --help output would help), or to supply the kernel in format supported by kexec (uImage maybe?)

The format required by kexec is elf-mips, that is what i'm feeeding,checked with the file utility:

 file vmlinux.elf 
vmlinux.elf: ELF 32-bit LSB executable, MIPS, MIPS32 rel2 version 1 (SYSV), statically linked, stripped

also if the format is wrong the error is different from kexec, i'm not sure how to check the compression format?

any idea how to check that?

binwalk perhaps? Can you just tell where did you get this file?

the kernel file is from the openwrt build process:
build_dir/target-mipsel_24kc_musl/linux-ramips_mt76x8

binwalk:

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             ELF, 32-bit LSB MIPS64 executable, MIPS, version 1 (SYSV)
3825796       0x3A6084        Linux kernel version "4.14.149 (canavisia@canavisia-Aspire-E5-574G) (gcc version 8.3.0 (OpenWrt GCC 8.3.0 r11242-889b841048)) #0 Fri Oct 18 12:20:40 2"
3903488       0x3B9000        CRC32 polynomial table, little endian
4387000       0x42F0B8        ASCII cpio archive (SVR4 with no CRC), file name: "op (skipped) already calibrated this CPU", file name length: "0xbrating ", file size: "0xInitramf"
4390929       0x430011        Unix path: /arch/mips/include/asm/fpu.h
4404377       0x433499        Unix path: /include/linux/sched/signal.h
4482120       0x446448        xz compressed data
4517072       0x44ECD0        Unix path: /lib/firmware/updates/4.14.149
4535168       0x453380        Unix path: /sys/firmware/devicetree/base
4544161       0x4556A1        Neighborly text, "neighbor table overflow!tics"
4563060       0x45A074        Neighborly text, "NeighborSolicitsports"
4563080       0x45A088        Neighborly text, "NeighborAdvertisements"
4565906       0x45AB92        Neighborly text, "neighbor %.2x%.2x.%pM lost rename link %s to %s"
4890624       0x4AA000        ELF, 32-bit LSB MIPS64 shared object, MIPS, version 1 (SYSV)
5050820       0x4D11C4        ASCII cpio archive (SVR4 with no CRC), file name: "dev", file name length: "0x00000004", file size: "0x00000000"
5050936       0x4D1238        ASCII cpio archive (SVR4 with no CRC), file name: "dev/console", file name length: "0x0000000C", file size: "0x00000000"
5051060       0x4D12B4        ASCII cpio archive (SVR4 with no CRC), file name: "root", file name length: "0x00000005", file size: "0x00000000"
5051176       0x4D1328        ASCII cpio archive (SVR4 with no CRC), file name: "TRAILER!!!", file name length: "0x0000000B", file size: "0x00000000"

i'm trying to launch the same exact kernel from kexec that is already on the device right now for test.
The idea is to be able to kexec newer version of the kernel from external storage, always builded from openwrt

Apparently, it's uncompressed, at least xz entry seems to be bogus, though of course you can try tail -c +4482121 vmlinux.elf | xzcat > unpacked and analyze the content of resulting file with hexdump, strings and binwalk.
Notice that the command is using decimal offset+1.
I have looked up kexec --help but it would help if you post what your particular version outputs.

i tried doing tail -c +4482121 vmlinux.elf | xzcat > unpacked
but it print an error about the data stream being corrupted, i'm not sure if i understood correctly what you asked me to do, i'm not very familiar with binwalk and binary format (especially elf kernel).

The output of kexec on the target:

root@(none):/# kexec -h
kexec-tools 2.0.16
Usage: kexec [OPTION]... [kernel]
Directly reboot into a new kernel

 -h, --help           Print this help.
 -v, --version        Print the version of kexec.
 -f, --force          Force an immediate kexec,
                      don't call shutdown.
 -x, --no-ifdown      Don't bring down network interfaces.
 -y, --no-sync        Don't sync filesystems before kexec.
 -l, --load           Load the new kernel into the
                      current kernel.
 -p, --load-panic     Load the new kernel for use on panic.
 -u, --unload         Unload the current kexec target kernel.
                      If capture kernel is being unloaded
                      specify -p with -u.
 -e, --exec           Execute a currently loaded kernel.
 -t, --type=TYPE      Specify the new kernel is of this type.
     --mem-min=<addr> Specify the lowest memory address to
                      load code into.
     --mem-max=<addr> Specify the highest memory address to
                      load code into.
     --reuseinitrd    Reuse initrd from first boot.
     --print-ckr-size Print crash kernel region size.
     --load-preserve-context Load the new kernel and preserve
                      context of current kernel during kexec.
     --load-jump-back-helper Load a helper image to jump back
                      to original kernel.
     --entry=<addr>   Specify jump back address.
                      (0 means it's not jump back or
                      preserve context)
                      to original kernel.
 -s, --kexec-file-syscall Use file based syscall for kexec operation
 -d, --debug          Enable debugging to help spot a failure.
 -S, --status         Return 0 if the type (by default crash) is loaded.

Supported kernel file types and options: 
elf-mips
Architecture options: 
    --command-line=STRING Set the kernel command line to STRING.
    --append=STRING       Set the kernel command line to STRING.
    --dtb=FILE            Use FILE as the device tree blob.
    --initrd=FILE         Use FILE as initial ramdisk.

root@(none):/# 

also uname -a:

Linux (none) 4.14.149 #0 Fri Oct 18 12:20:40 2019 mips GNU/Linux

It is perfectly normal and expected in this case, because xz stream is embedded in another file. xzcat should still unpack what it could despite the warning. You should have file unpacked created in the directory. Check its content, that's all.

I do not know what else you can do though. Except maybe you can give it dtb file? It should also be compiled if you compiled the complete image from source.

Thanks, at the moment i don't have the device with me i will post an update tomorrow

I've managed to try:

  1. tail -c +4482121 vmlinux.elf | xzcat > unpacked produce a zero byte file
  2. i used extract-vmlinux vmlinux.elf > unpack.elf from the linux repository but it produced a file of the same size. Maybe the kernel is not compressed?? edit: or it dosen't work with .elf files
  3. I tried booting both the compressed kernel and the "uncompressed" one with the dtb as parameter in both case the output was:
root@OpenWrt:~# kexec -d --dtb=/tmp/onion.dtb --command-line="" /tmp/vmllinux.elf 
Try gzip decompression.
Try LZMA decompression.
lzma_decompress_file: read on /tmp/vmllinux.elf of 65536 bytes failed
kernel: 0x7797f010 kernel_size: 0x4d1794
kexec_load: entry = 0x39baa0 flags = 0x80000
nr_segments = 3
segment[0].buf   = 0x77980010
segment[0].bufsz = 0x4d03cc
segment[0].mem   = 0
segment[0].memsz = 0x612000
segment[1].buf   = 0x41e02c
segment[1].bufsz = 0x200
segment[1].mem   = 0x612000
segment[1].memsz = 0x1000

The output is truncated since i didn't have access to a serial port and i needed to do via ssh, but from what i see it's the same kernel panic (since the actual booting of the new kernel is with kexec -e.

So it seem that the loading of the kernel is the main problem, id imagine i need to change the memory address but i'm not sure what value to use.

if someone need the complete output of the kernel panic with the btb parameter i can post-it tomorrow.

i'm not sure what to try next, documentation on kexec is scarce

I am seeing this fault in 21.02-rc3 with the 5.4 kernel and also building from head with the 5.10 kernel. This appears to be completely broken. Did you have any luck? Note: This is crashing the kernel before kexec -e is being run.

# kexec -l --append="console=ttyS0,115200 rootfstype=squashfs" /tmp/vmlinux.elf 
Modified cmdline:console=ttyS0,115200 rootfstype=squashfs 
Unable to find /proc/device-tree/chosen/[linux,]stdout-path, printing from purgatory is disabled
[  761.663355] kexec command line truncated to 256 bytes
[  761.668452] usercopy: Kernel memory overwrite attempt detected to kernel text (offset 81844, size 256)!
[  761.677856] Kernel bug detected[#1]:
[  761.681423] CPU: 2 PID: 1755 Comm: kexec Not tainted 5.10.43 #0
[  761.687317] $ 0   : 00000000 00000001 0000005b 00988000
[  761.692545] $ 4   : 805e4408 8101e378 810238f8 822e1ca8
[  761.697770] $ 8   : 00000001 822e1cc0 00000000 000019c8
[  761.702991] $12   : 74206465 ffffff7f 00000001 656b206f
[  761.708214] $16   : 800153b4 00000100 00000000 800154b4
[  761.713435] $20   : 0041e02c 80010000 00000000 00000000
[  761.718656] $24   : 00000000 80314988                  
[  761.723877] $28   : 822e0000 822e1e50 80650000 801507e8
[  761.729100] Hi    : 00000125
[  761.731964] Lo    : 122f2000
[  761.734845] epc   : 801507e8 usercopy_abort+0x94/0x98
[  761.739886] ra    : 801507e8 usercopy_abort+0x94/0x98
[  761.744913] Status: 1100fc03 KERNEL EXL IE 
[  761.749097] Cause : 50800024 (ExcCode 09)
[  761.753086] PrId  : 0001992f (MIPS 1004Kc)
[  761.757161] Modules linked in: mt7915e mt76 mac80211 cfg80211 hwmon crc_ccitt compat sha256_generic libsha256 seqiv jitterentropy_rng drbg hmac cmac leds_gpio gpio_button_hotplug zram zsmalloc
[  761.774316] Process kexec (pid: 1755, threadinfo=c9ceaefe, task=a03f7798, tls=77ef2ec8)
[  761.782284] Stack : 00000100 80568ddc 805b702c 80568f04 8055f414 8055f414 8055f414 00013fb4
[  761.790642]         00000100 8237ce00 00000100 80150940 822e1eb4 80010000 0041e02c 80069818
[  761.798997]         00000100 8237ce00 38e38e39 00000100 80015388 8237ce00 80010000 80014cb8
[  761.807354]         822e1eb0 00000100 00000006 8054f4ba 8237ce00 00000000 807c8ee0 8237ce20
[  761.815709]         00000000 00000003 00000000 800a4498 00000000 8012a078 00000000 815ba758
[  761.824061]         ...
[  761.826505] Call Trace:
[  761.828944] [<801507e8>] usercopy_abort+0x94/0x98
[  761.833632] [<80150940>] __check_object_size+0x154/0x1c4
[  761.838945] [<80014cb8>] machine_kexec_prepare+0x124/0x2c4
[  761.844430] [<800a4498>] sys_kexec_load+0x258/0x388
[  761.849293] [<800135f8>] syscall_common+0x34/0x58
[  761.853980] 
[  761.855461] Code: afa30010  0c01a5fe  24848e80 <000c000d> 3c02805d  8c4293fc  1c40006c  00000000  27bdffd0 
[  761.865209] 
[  761.866973] ---[ end trace 8145ef41fc12390f ]---
[  761.871748] Kernel panic - not syncing: Fatal exception
[  761.876984] Rebooting in 3 seconds..

@hexmiles , @strontium aren't you hit by commit "Kernel: Activate CONFIG_HARDENED_USERCOPY"?

I found that commit through an issue: CONFIG_HARDENED_USERCOPY detects kernel memory overwrite attempt to kernel text

That commit landed to OpenWrt master in May 2019 and the fix was included in 5.13-rc7:

Merged upstream for 5.13-rc7 inclusion.
2021-02-26 110febc ARC: fix CONFIG_HARDENED_USERCOPY

@xabolcs That is PRECISELY the issue. I modified the kernel config to say "CONFIG_HARDENED_USERCOPY=n" and kexec now works again.

You are a lifesaver, thank you for the pointer. Now I just need to make kexec be able to load a uimage, which shouldn't be too difficult. kexec-tools already support this for other targets, just not mips for some reason.

1 Like

Ohh, great, I'm glad that it helped! :+1:

I saw your mails on the list. (Without response ... :confused:)
How about backporting that fix to OpenWrt?

Will do, along with a patch for kexecing a uimage for mips (once thats working). So that we should be able to kexec directly in to a sysupgrade image held on another flash partition.

1 Like

When you are done, it would be nice to backport to 21.02 and 19.07 too!

@hauke's commit dates back to 19.07.0-rc1 ... means that this kexec was working in 18.06? Didn't tried, just thinking.

While this fixes kexec, unfortunately the MT7621 linux kernel does not like it.

It starts up and detects only 2 processors, and not 4.
Then it complains the PCIE bus is already set up.
It then assigns all the peripherals to different irq's than it does on a clean boot.
And before it can print up the failsafe boot options, it just hangs.

Looks like the MT7621 (and maybe the mips drivers) drivers do not de-initialize themselves properly and rely on the bootloader to give them a clean state on start.

Just an update so if anyone else comes here wondering how to do it.
MT7621 as it stands is broken for kexec. Fixable, but broken.
I don't think my "fixes" would make it into openwrt.

  1. OpenWRT and now upstream as of 5.13, applies a patch to detect single core variant MT7621S version chips. This forces the kexec'd kernel to drop to a single core. I don't know any way to fix it for dual core MT7621 and not break it for single core MT7621.
  2. It looks like both Etherenet drivers and PCI drivers leave the hardware in an odd state when the kernel shuts down, clocks disabled and in reset. Which is NOT what it is out of u-boot. And when you try and init the drivers with this state they hang. I fixed that with some grotesque hackery to force all peripherals into a hardware reset and back out before the kernel itself loads and runs. (I hack the stand alone LZMA loader code to achieve this). Its ugly, but it does work. With both of these hacks an MT7621 will kexec properly.

IF you have a choice and you want to use kexec, use different hardware.