[Solved] WRT1900ACV1 reboots: kernel 4.9

Tested davidc502's r3681-e58ea0a, still getting reboots.
No crashlog in /sys/kernel/debug.

hmm...

Edit: r3640 ran without incident for ~6 days 16 hours, but with continued reports of reboots, the above, and a kernel push, decided to build and flash a new image - r3716-cd0f990 running now.

1 Like

@anomeome I don't think that was the issue, I just tried the same version. By the time I'd walked back to the couch my Nexus Player didn't have wifi and I heard the fan start up as it booted :cry:

I got just over 6 hrs on r3716 before a reboot. And again no crashlog.

I have been doing kernel bumps and so far no real difference.
Linux version 4.9.16 (jeff@jeff-VM) (gcc version 5.4.0 (LEDE GCC 5.4.0 r3765-53b84e4) ) #0 SMP Mon Mar 20 07:26:35 2017
The 4 entries below are and have been in the syslog.
Mon Mar 20 11:10:21 2017 kern.err kernel: [ 1.263197] of: dev_pm_opp_of_cpumask_add_table: couldn't find opp table for cpu:0, -19
Mon Mar 20 11:10:21 2017 kern.err kernel: [ 1.263212] cpu cpu1: opp_list_debug_create_link: Failed to create link
Mon Mar 20 11:10:21 2017 kern.err kernel: [ 1.263218] cpu cpu1: _add_opp_dev: Failed to register opp debugfs (-12)
Mon Mar 20 11:10:21 2017 kern.warn kernel: [ 1.698674] This architecture does not have kernel memory protection..

Another strange thing is in menuconfig KERNEL_CRASHLOG =n
Yet in config-kernel.in KERNEL_CRASHLOG defaults to y

I really wish I knew more about the build system. Oh well I will keep trying in my limited way.

Edit: Just noticed that syslog and Luci overview show different versions?
LEDE Reboot SNAPSHOT r3792-0685f2a / LuCI Master (git-17.078.53745-180f2d6)

Really nothing new here. I think the crash stuff gets turned on by patch:

./target/linux/generic/patches-4.9/930-crashlog.patch

so is there OOTB.

Do you have security features turned on in your build:

CONFIG_PACKAGE_procd-seccomp=y
CONFIG_PACKAGE_procd-ujail=y
CONFIG_KERNEL_NAMESPACES=y

Syslog shows the revision of toolchain gcc compilation moment. Not the firmware revision.

@hnyman Thanks I was wondering about that.
@anomeome I will check when done with this build I have enabled a few different things to try.

@anomeome No on all 3
I will enable on next build. Are you stable after those changes? And is there a crashlog generated if not?

No, everything is the same, I just thought if you were running that way I might try turning that stuff off. So, obviously not the issue.

At any rate, when you turn on namespaces,seccomp in Global build settings, you will find a couple of NEW menu items under Base system to select.

1 Like

Built and flashed it does add some interesting things to syslog.
Including.
Mon Mar 20 17:05:34 2017 user.notice : setting up led usb3
Mon Mar 20 17:05:35 2017 kern.warn kernel: [ 35.755822] BUG: key cd072ca0 not in .data!
Mon Mar 20 17:05:35 2017 kern.warn kernel: [ 35.760221] BUG: key cd072a20 not in .data!
Mon Mar 20 17:05:35 2017 kern.warn kernel: [ 35.764624] BUG: key cd072d20 not in .data!
Mon Mar 20 17:05:35 2017 kern.warn kernel: [ 35.769086] BUG: key cd072520 not in .data!
Mon Mar 20 17:05:35 2017 kern.warn kernel: [ 35.773323] BUG: key cd072820 not in .data!
LOL!

After up-time of close to 12 days I decided with the kernel push yesterday to build and flash a new image. I am beginning to think that whatever the issue may have been, it has been resolved.

@northbound, I do not see any of those messages in my log upon a boot with all of those security attributes enabled.

@anomeome That is caused by CONFIG_KERNEL_PROVE_LOCKING=y
I thought that may be why I was not doing random reboots I am on.4.9.18 now. I will remove that on the next build. I was trying too many changes at a time. :smile:

Builds from https://downloads.lede-project.org/snapshots/targets/mvebu/generic/ are still crashing/rebooting for me. I just tried r3883-2ebfdab and it rebooted while installing packages.

Yep, disappointing indeed. r3716-cd0f990 ran for close to 12 days with no random reboot, r3873-02fe942 rebooted last night after about 1 day.

I think it was just a hard lockup. It seemed to be 10 sec. then reboot. I think that why no crashlog
was created. I am beginning to think it may have been a kernel issue. Since I have been getting ahead of trunk the issue has gone away for me up to > Linux LEDE 4.9.20 #0 SMP Thu Mar 30 16:31:02 2017 armv7l GNU/Linux. All has been stable. Should this bug be closed?
https://bugs.lede-project.org/index.php?do=details&task_id=564

Edit: @anomeome The changes you had, caused 2 reboots in an Hr. So I backed them out.
I am not complaining just letting you know what happened here with your changes.

Edit2: Sorry forgot this part
Fri Mar 31 19:56:58 2017 kern.err kernel: [ 1.276788] cpu cpu1: opp_list_debug_create_link: Failed to create link
Fri Mar 31 19:56:58 2017 kern.err kernel: [ 1.283434] cpu cpu1: _add_opp_dev: Failed to register opp debugfs (-12)

This part was not addressed. Is this part of the old problem or not? Just curious.. Still works fine here

@northbound, I have run with seccomp and namespaces turned in in my build since a few issues were resolved a number of weeks back, and still on 4.4 kernel. The reboots started with 4.9, and as you did not have those attributes enabled, but were experiencing the reboot, I arrived at the conclusion they were not the issue; I wonder if @InkblotAdmirer was running with those. Also, as per my earlier post, I had a build run ~12 days without a reboot, updated to a new build (kernel update) and they started again. Same build on rango has no issues.

At any rate, instead of moving ahead with a kernel on LEDE, I have been taking a look at DSA, linux-next and other assortments on an owrt image based on @sera patchset on the mamba. Probably do another LEDE image on a kernel push.

I did not add seccomp or namespaces.

On 4.9.18 I had a mamba device up for ~4 days so when 4.9.20 was released I updated both mambas I own. One of them rebooted in < 24 hrs and since one of them is "mission critical" in my home network I reverted both back to 4.4 where uptime lasts the typical two weeks between firmware updates.

Since I have no compelling reason to upgrade from 4.4 on the mambas I doubt I will pursue this any further. It's a little annoying to have to build a separate kernel for the device but that's a small price to pay for stability.

Any ideas on why leaving 4.4 support is causing issues?

I don`t believe that it is, just think it is a matter of moving master(aka trunk) forward, and not having to support/update the 4.4 patch-set for this target. May be a positive though, in that it may mean more eyes on the issue. At any rate, you should be able to leave that commit out of your build tree and continue with k4.4 on a trunk build, at least until outdated patches catch you up.