OpenWrt Forum Archive

Topic: Kamikaze r6257 with BusyBox 1.4.0 getting Segmentation fault with ps

The content of this topic has been archived on 8 Apr 2018. There are no obvious gaps in this topic, but there may still be some posts missing at the end.

Kamikaze r6257 with BusyBox 1.4.0 getting Segmentation fault with the ps command. 

root@OpenWrt:~# ps
  PID  Uid     VmSize Stat Command
Segmentation fault

All other BusyBox commands seem to be working.  Will forcing it to use BusyBox 1.4.1 fix this issue (ie- bug), or is there some setting I need to change in the build process?  I had a look on both the Kamikaze SVN and the BusyBox bug tracker, but didn't see anything.  Any ideas?

I have had this problem with all the later builds. ps -uax gives me a crash. Hope its just me missing something.

What platform is this on? Broadcom?

I thought it was just me - for me it occurs on a WL-500GP using the 2.6 kernel.

It may well be a kernel problem, not busybox one.

Same happens on WL-500g Premium running Debian/mipsel (with a recent 2.6.19.2 openwrt/kamikaze kernel).

Also an Asus WL-500G Premium.  Running with the latest (a few days ago) trunk.  What OS did you compile it on?  I used RHEL4.  I thought ps pulls its information from the /proc file system, so I doubt its a kernel error.  More likely ps isn't parsing the information correctly.  Could it be because it was compiled on a platform where the /proc structure is different?  Ie- during compile it generated the algorithm bases on that proc system which doesn't match the one being generated by the latest Linux kernel?

There is a ticket on this already: https://dev.openwrt.org/ticket/1131
And as you can see at the bottom there is a "fix" but not any solution yet.

I have also got this message ( asus wl-500gp ) both using 'ps' and 'strace'

agruman wrote:

There is a ticket on this already: https://dev.openwrt.org/ticket/1131
And as you can see at the bottom there is a "fix" but not any solution yet.

I have also got this message ( asus wl-500gp ) both using 'ps' and 'strace'

My symptoms aren't exactly the same.  I will have to check the output of dmesg when I get hime to verify if I am seeing a kernel oops along with the segfault in ps.  Also whoever said it is not related to the wireless card is correct because I have replaced my Broadcom mini-PCI card with the Atheros and it is detected and the modules is loaded.

I checked and I am getting the oops.  Didn't notice it the first time round.

Oops[#1]:
Cpu 0
$ 0   : 00000000 10009c00 80848000 00000001
$ 4   : 80848000 fffdefab 0000000b 80848000
$ 8   : 00000000 00000000 00000000 7ff0efab
$12   : 00000001 0006d8b5 00000001 2abdeeac
$16   : 0000000b 0000000b 7ff0efab 80848000
$20   : 803cdc60 803cdc94 80848000 00000000
$24   : 00000003 800159e0
$28   : 808a2000 808a3df8 802ce858 80014b50
Hi    : 000000d0
Lo    : 000001b3
epc   : 8010e134     Tainted: P
ra    : 80014b50 Status: 10009c03    KERNEL EXL IE
Cause : 0000001c
PrId  : 00029006
Modules linked in: ath_pci wlan_xauth wlan_wep wlan_tkip wlan_ccmp wlan_acl ath_rate_sample ath_hal(P) wlan_scan_sta wlan_scan_ap wlan ipt_TTL ipt_ttl ipt_TOS ipt_time ipt_tos xt_MARK xt_mark xt_mac xt_length ipt_ECN ipt_ecn imq ipt_IMQ ipt_layer7 ipt_ipp2p xt_portscan xt_DELUDE xt_string ipt_recent ipt_owner xt_limit xt_helper xt_CONNMARK xt_connmark switch_robo switch_core diag
Process ps (pid: 992, threadinfo=808a2000, task=80841168)
Stack : 803cdc60 803cdc94 80848000 00000000 80069154 800690d0 802ce858 803cdc60
        8023827c 80f84f20 80ea0fab 0000000b 808a3e30 808a3e34 802ac400 80c0a5c0
        0000000b 00000000 803cdc60 fffffff4 80848000 00001000 802ce858 00000002
        00409710 800bbb98 000003ff fffffffd 8095c790 808a3f18 00000000 00000002
        802ce858 80848000 000003ff fffffff4 8095c790 808a3f18 7fe9c860 800bc394
        ...
Call Trace:[<80069154>][<800690d0>][<800bbb98>][<800bc394>][<8007d9b4>][<8007dd98>][<8007cfb0>][<80012860>]

Code: 30d80003  1306000a  00000000 <98a80000> 88a80003  24a50004  24c6fffc  ac880000  14d8fffa
note: ps[992] exited with preempt_count 1

Now the questions is, what is the cause and how does it get fixed?

The rewritten system code for broadcom 2.6 will be ready soon. We'll see if the problem still exists then...

Ok deffinately a bug or compile problem with the kernel itself.  I checked the procps source code and it is relying on being able to read the /proc/<pid>/cmdline output; however, when it tries that is when the oops happens.  One can simulate it by trying to cat the cmdline file directly.

root@OpenWrt:/proc/919# cat cmdline
Segmentation fault

This is probably something silly.  The program top for example seems to work.  Probably because it is only looking for the executable names.  How strange.

Mem: 9912K used, 3840K free, 0K shrd, 1012K buff, 3340K cached
Load average: 0.00 0.00 0.00
  PID USER     STATUS   RSS  PPID %CPU %MEM COMMAND
1013 root     R        400   987  0.9  2.8 top
  919 root     S        572   847  0.0  4.1 dropbear
  987 root     S        484   919  0.0  3.4 ash
  352 root     S <      424     1  0.0  3.0 udhcpc
    1 root     S        396     0  0.0  2.8 init
  847 root     S        388     1  0.0  2.8 dropbear
  143 root     S        364     1  0.0  2.6 syslogd
  977 nobody   S        360     1  0.0  2.6 dnsmasq
  841 root     S        328     1  0.0  2.3 crond
  146 root     S        296     1  0.0  2.1 klogd
  151 root     S        232     1  0.0  1.6 init
  857 root     S        176     1  0.0  1.2 httpd
   60 root     SW         0     1  0.0  0.0 mtdblockd
    4 root     SW<        0     1  0.0  0.0 khelper
  127 root     SWN        0     1  0.0  0.0 jffs2_gcd_mtd4
    2 root     SWN        0     1  0.0  0.0 ksoftirqd/0
    3 root     SW<        0     1  0.0  0.0 events/0
    5 root     SW<        0     1  0.0  0.0 kthread
   17 root     SW<        0     5  0.0  0.0 kblockd/0
   45 root     SW         0     5  0.0  0.0 pdflush
   46 root     SW         0     5  0.0  0.0 pdflush

nbd wrote:

The rewritten system code for broadcom 2.6 will be ready soon. We'll see if the problem still exists then...

Is it to be expected anytime soon? Realistically - February, March 2007, or later?

I'm not good at making realistic predictions, so I simply won't make one this time smile

Are there any code snapshots available then?

Yes. brcm47xx-2.6 (marked as broken, because of flash read errors, initramfs target might work)

nbd wrote:

I'm not good at making realistic predictions, so I simply won't make one this time smile

Can you give us some hints then as to where to look for the source of this issue.  Do you think it is some sort of a typo in the code or is it a fundemental issue with the support for this type of CPU?  Perhaps the code is assuming the CPU can do something it can't?  Unfortunately I am very busy these days so re-writing whole chunks of code isn't going to happen, but if its a bug I can assist in locating it.

nbd wrote:

Yes. brcm47xx-2.6 (marked as broken, because of flash read errors, initramfs target might work)

Did you mean setting Target Images -> ramdisk?

I don't see an image created in that case - is it built somewhere else than the usual "bin/"?

build_mipsel/linux/vmlinux only at the moment

Hmm, it's not "flashable", is it?

So the only way to use it is to use kexec (I don't see a zImage anywhere, though).

Or is there a way to flash the kernel directly on some partition?

Any update with this issue?  Has the new Broadcom CPU code been included in the latest version of Kamikaze?

I just tried this with revision 6567 (compiled with gcc-4.1.something and gcc-3.6.something) on a Asus WL500gP and ps still segfaults.

Did you use the old code, or new code (which you can enable in "developer options", as it's marked broken)?

I guess I used the old code. Will do a svn update now and recompile, but I'm still not sure where to enable the developer code?

You enable it somewhere in advanced options, build options, or how is it called?

"make menuconfig" doesn't have that many options, play around with it and you'll surely find it.

I'm really stumped here. I use:

root@llama:/home/openwrt/mips-sandi/trunk# make menuconfig V=99 DEVELOPER=1 FORCE=1

As far as I can tell, these are the relevant plaform options in my .config:

CONFIG_LINUX_2_6_BRCM47XX=y
CONFIG_LINUX_2_6_BRCM47XX_Atheros=y
CONFIG_PCI_SUPPORT=y
CONFIG_USB_SUPPORT=y
CONFIG_ATM_SUPPORT=y
CONFIG_VIDEO_SUPPORT=y
CONFIG_USES_SQUASHFS=y
CONFIG_mipsel=y
CONFIG_ARCH="mipsel"
CONFIG_DEVEL=y
CONFIG_BROKEN=y
CONFIG_BUILDOPTS=y
CONFIG_AUTOREBUILD=y
CONFIG_JLEVEL=2
CONFIG_TOOLCHAINOPTS=y
CONFIG_BINUTILS_VERSION_2_16_1=y
CONFIG_BINUTILS_VERSION="2.16.1"
CONFIG_EXTRA_GCC_CONFIG_OPTIONS=""
CONFIG_INSTALL_LIBSTDCPP=y
CONFIG_LARGEFILE=y
CONFIG_TARGET_OPTIMIZATION="-Os -pipe -mips32 -mtune=mips32 -funit-at-a-time"
CONFIG_GCC_VERSION="3.4.6"

I guess the CONFIG_DEVEL=y means that the developer code is enabled. Even though, ps still segfaults just like it did before. Oh, my tunk version is 6626.

The discussion might have continued from here.