[Solved] WRT1900ACV1 reboots: kernel 4.9

I've reverted the removal of Linux 4.4 support. I currently don't have time to debug this myself, but please let me know if you guys make any progress in figuring out the root cause of this issue.

1 Like

Requesting update for this issue. 1900ac Version 1 users are still reporting reboot issues after installing latest lede snapshots with 4.9.x Kernels.

It's important we get V1 hardware fixed due to security issues running kernel 4.4.x.

1 Like

Still happens here. I may be up for 3 or 4 days then reboot twice in a few minutes no rhyme or reason.
4.9.27 and mwlwifi 10.3.4.0-20170512. And as usual no crash.log.

David, what security issues are you worried about? Kernel 4.4 is still maintained (linux foundation, not LEDE) and you can submit patches (or just apply locally) to be pulled into the tree if the devs don't update fast enough -- nbd has made sure kernel 4.4 is still an option for mvebu.

IMO this is a nuisance for anyone building their own -- you just have to do two independent builds.

The bigger issue is that LEDE releases (or anything from the buildbot) are not really a viable option for the AC V1.

@davidc502 I'm on 4.4.67 for ar71xx, x86_64 and mt7621. I can post the patch if you want for 17.01.1. .68 is expected any minute now though.

Kernels 4.5 down to 2.6 are vulnerable to remote code execution within the kernel as root. Maybe this has already been patched? The information about this vulnerability was just released last month.

I'm pretty sure it has. Thing is, the kernel devs don't explicitly refer to the CVEs in the changelogs, so you really need to track them down, you can't just grep the changelog for CVEs...

Just check the NIST entry and you'll see the Linux kernel was patched in January 2016 (!). On top of that, the page clearly states the vulnerability is in 4.4.60 and older kernels ;).

You are running the DIR-860L as well, right? How's the 4.4.67 kernel treating you? Are you using SQM by any chance? The current master branch and 17.01 branch both have issues with SQM on the mt7621 devices. It can cause stack traces and crashes that result in a reboot. Kernel 4.9 seems to have fixed it for me, but it is causing other issues for me. Was wondering whether kernel 4.4.67 is any good on mt7621 devices with SQM enabled.

For further discussion on the aforementioned issues, please see the end of this thread:
https://forum.openwrt.org/t/optimized-build-for-the-d-link-dir-860l/

And this bug report (please vote for it if you would like to developers to focus on this bug):

I am, yes. But there's no difference between the 17.01.x 4.4 versions and the .67 one. I'm not switching to trunk until 4.9 is stable enough on ramips.

I already voted for your bug report as well :slight_smile:

Just notice crash dump is enabled for ARM via https://github.com/lede-project/source/commit/48d71ab5021e5238623bab2f87b6425b2609c60a, can anyone give it a try?

From the look of that it's enabled by default? I just tested r4228-43e4e1f and there was no crash log created at /sys/kernel/debug/ (unless it's supposed to be somewhere else now?). My uptime was ~2hrs before lockup/reboot

Yes, it is enabled by default. I have no idea then.
nbd, can you help to find the root cause?

@nbd Do you or anyone else have time to look into this now? I don't think we're making any progress on our own. Sorry to hassle you with a direct mention...

@Mushoz, thought this might be of interest to you. It might be a weird coincidence, but running on a recent 17.01 (pre 17.01.2, so to speak) branch with a GCC 6.3 build (instead of a GCC 5.4 build), I am looking at almost 4 days and 12 hours uptime. This is with kernel 4.4.70.

Fingers crossed of course, but remarkable (usually I have one or more reboots a day).

Sorry for the offtopic :slight_smile:

That sounds very good! Thank you very much for letting us know :slight_smile: Is this with Cake enabled? And if so, have you tried heavy traffic to see if it doesn't crash?

Yeah maybe move the DIR-860L chat elsewhere. We're already struggling to be noticed without being buried in our own topic.

Anyone know if this WRT1900AC V1 Reboot on 4.9 Kernel build problem is being worked on by somebody?

I would like to go to the 4.9 Kernel builds but have experienced the reboots and reverted back to 4.4.70.

DISTRIB_DESCRIPTION='LEDE Reboot SNAPSHOT r4512-f3ae0f8'
Kernel = 4.9.34
SQM enabled.

Spontaneous reboot during long large downloads. I have remote logging enabled but nothing collected.

I use Linksys wrt1900acs v2 with LEDE firmware version compiled by Daniel named SuperWRT, everything is working fine!

http://s.go.ro/8qfvfhf1

https://superwrt.download/firmware/

EDITED: sorry, prematurely posted from fat-finger

Even though I said I was dropping this... I can't let something like this go and I think I may be on to something.

Poring through changes from 4.4 to 4.9, the device tree changes appear straightforward enough but I decided to decompile the compiled device tree and voila... the mamba dts (decompiled) is very different on 4.9 than 4.4. Here is one excerpt:

			crypto@90000 {  // kernel 4.4
			compatible = "marvell,armada-xp-crypto";
			reg = <0x90000 0x10000>;
			reg-names = "regs";
			interrupts = <0x30 0x31>;
			clocks = <0x8 0x17 0x8 0x17>;
			clock-names = "cesa0", "cesa1";
			marvell,crypto-srams = <0xf 0x10>;
			marvell,crypto-sram-size = <0x800>;
		};


			
                    crypto@90000 {  // kernel 4.9
			compatible = "marvell,armada-xp-crypto";
			reg = <0x90000 0x10000>;
			reg-names = "regs";
			interrupts = <0x30 0x31>;
			clocks = <0x7 0x17 0x7 0x17>;
			clock-names = "cesa0", "cesa1";
			marvell,crypto-srams = <0xe 0xf>;
			marvell,crypto-sram-size = <0x800>;
		};

Clocks, interrupts, etc. are all varying from 4.4 to 4.9. Looking further, I think it's this commit:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/arch/arm/boot/dts/armada-370-xp.dtsi?h=linux-4.9.y&id=55877f58b0be83a4ffb4639a83f99c28df418e3e

I created a patch to undo a couple changes and the device tree is now nearly identical. Without completely reverting a few changes I'm not sure it can be made identical but I've got a mamba running on 4.9 now and I've been beating up the wifi with iperf for a few hours now and everything is looking OK. Keep your fingers crossed...

In case you want to build and expand on test hours more quickly (since the reboots can be pretty fickle) here is the patch:

--- /arch/arm/boot/dts/armada-xp.dtsi	2017-07-17 21:13:53.372001000 -0500
+++ /arch/arm/boot/dts/armada-xp.dtsi	2017-07-18 20:08:18.874801712 -0500
@@ -372,6 +372,4 @@
 
 &spi1 {
 	compatible = "marvell,armada-xp-spi", "marvell,orion-spi";
-	pinctrl-0 = <&spi1_pins>;
-	pinctrl-names = "default";
 };
--- /arch/arm/boot/dts/armada-370-xp.dtsi	2017-07-15 05:17:55.000000000 -0500
+++ /arch/arm/boot/dts/armada-370-xp.dtsi	2017-07-15 04:58:03.000000000 -0500
@@ -148,6 +148,26 @@
 				interrupts = <50>;
 	    		};
     
    +			spi0: spi@10600 {
    +				reg = <0x10600 0x28>;
    +				#address-cells = <1>;
    +				#size-cells = <0>;
    +				cell-index = <0>;
    +				interrupts = <30>;
    +				clocks = <&coreclk 0>;
    +				status = "disabled";
    +			};
    +
    +			spi1: spi@10680 {
    +				reg = <0x10680 0x28>;
    +				#address-cells = <1>;
    +				#size-cells = <0>;
    +				cell-index = <1>;
    +				interrupts = <92>;
    +				clocks = <&coreclk 0>;
    +				status = "disabled";
    +			};
    +
     			i2c0: i2c@11000 {
     				compatible = "marvell,mv64xxx-i2c";
     				#address-cells = <1>;
    @@ -300,42 +320,6 @@
     				status = "disabled";
     			};
     		};
-
-		spi0: spi@10600 {
-			reg = <MBUS_ID(0xf0, 0x01) 0x10600 0x28>, /* control */
-			      <MBUS_ID(0x01, 0x1e) 0 0xffffffff>, /* CS0 */
-			      <MBUS_ID(0x01, 0x5e) 0 0xffffffff>, /* CS1 */
-			      <MBUS_ID(0x01, 0x9e) 0 0xffffffff>, /* CS2 */
-			      <MBUS_ID(0x01, 0xde) 0 0xffffffff>, /* CS3 */
-			      <MBUS_ID(0x01, 0x1f) 0 0xffffffff>, /* CS4 */
-			      <MBUS_ID(0x01, 0x5f) 0 0xffffffff>, /* CS5 */
-			      <MBUS_ID(0x01, 0x9f) 0 0xffffffff>, /* CS6 */
-			      <MBUS_ID(0x01, 0xdf) 0 0xffffffff>; /* CS7 */
-			#address-cells = <1>;
-			#size-cells = <0>;
-			cell-index = <0>;
-			interrupts = <30>;
-			clocks = <&coreclk 0>;
-			status = "disabled";
-		};
-
-		spi1: spi@10680 {
-			reg = <MBUS_ID(0xf0, 0x01) 0x10680 0x28>, /* control */
-			      <MBUS_ID(0x01, 0x1a) 0 0xffffffff>, /* CS0 */
-			      <MBUS_ID(0x01, 0x5a) 0 0xffffffff>, /* CS1 */
-			      <MBUS_ID(0x01, 0x9a) 0 0xffffffff>, /* CS2 */
-			      <MBUS_ID(0x01, 0xda) 0 0xffffffff>, /* CS3 */
-			      <MBUS_ID(0x01, 0x1b) 0 0xffffffff>, /* CS4 */
-			      <MBUS_ID(0x01, 0x5b) 0 0xffffffff>, /* CS5 */
-			      <MBUS_ID(0x01, 0x9b) 0 0xffffffff>, /* CS6 */
-			      <MBUS_ID(0x01, 0xdb) 0 0xffffffff>; /* CS7 */
-			#address-cells = <1>;
-			#size-cells = <0>;
-			cell-index = <1>;
-			interrupts = <92>;
-			clocks = <&coreclk 0>;
-			status = "disabled";
-		};
 	};
 
 	clocks {

Here is the remaining diff in the compiled dts:

> 						linux,default-trigger = "disk-activity";
...
> 				};
> 
> 				spi1-pins {
> 					marvell,pins = "mpp13", "mpp14", "mpp16", "mpp17";
> 					marvell,function = "spi1";

Both are artifacts of further changes that I haven't undone yet, TBD whether they matter.

Best of luck...

1 Like