WRT3200ACM - boot problem with anon_mount 1 in fstab

EDITED SUMMARY:

This looks like a nasty bug as it will hit newcomer users who have flashed LEDE just once and are configuring their router.

Simply adding "anon_mount 1" option to /etc/config/fstab will make router unbootable if you still have Linksys original firmware in the other partition.

Failsafe will work, and editing /etc/config/fstab there (or deleting it) is enough to recover, but LEDE failsafe may be hard to find for new users :frowning:

  • Router hangs at boot when it tries to auto-mount the inactive rootfs partition that contains the original Linksys OEM contents in case you have /etc/config/fstab containing "anon_mount 1" option.
    • fstab init script calls fstools "/sbin/block mount", which hangs permanently at boot (or if issued from command line later)
    • The original Linksys partition contains raw jffs2 instead of ubi. That is likely the reason for the hang
  • The bug will not get triggered if both firmwares have been flashed with LEDE or Openwrt, as then also the inactive rootfs partition contains ubi.

So far this has been found in WRT3200ACM, but the same will likely happen with all WRT1900/1200/3200ACx devices that have dual firmware partitions.

The hang can be seen from kernel log if the command was issued from console and another console is used to read the log. The log gets filled with this at 0.2s intervals:

kern.err kernel: [ 3032.483307] pxa3xx-nand f10d0000.flash: Wait time out!!!

Original:

I have run into strange trouble with my new WRT3200ACM.

For some reason enabling the anon_mount option in /etc/config/fstab prevents the boot process from completing. The boot progresses to some stage and it e.g. answers ping, but power led blinks and it does not allow login and does not provide dhcp etc. services.

I have had that option in all my routers (WNDR3700, WNDR3800, R7800) to make USB memory sticks to automount when inserted into the router's USB port. So it is a quite normal setting.

I first compiled a firmware image from master with kernel 4.9, and that failed to boot. I thought that the new kernel 4.9 might be the reason. So I installed 17.01.0-rc2 official image and it booted ok. I installed packages, configured, rebooted and everything was ok as long as I did not change that anon_mount to 1 in fstab. After changing that, the next boot again gets stuck. (Editing the option in failsafe enables the router to boot normally.)

Boot completes ok with the output of "block detect" entered into fstab:

config 'global'
        option  anon_swap       '0'
        option  anon_mount      '0'
        option  auto_swap       '1'
        option  auto_mount      '1'
        option  delay_root      '5'
        option  check_fs        '0'

Boot does not complete with anon_mount 1 in /etc/config/fstab:

config 'global'
        option  anon_swap       '0'
        option  anon_mount      '1'
        option  auto_swap       '1'
        option  auto_mount      '1'
        option  delay_root      '5'
        option  check_fs        '0'

I have no good guess about the reason. My initial guess was some connection to ubi1:syscfg being mounted as /tmp/syscfg as that is different than in my other devices, but it is hard to see how that cound be the reason. (That mounting is done in /lib/preinit/81_linksys_syscfg )

After some investigation it looks like the reason is actually "/sbin/block mount".

At the boot the init script /etc/init.d/fstab actually runs "/sbin/block mount".
Running that command manually from console (with anon_mount 1) makes the console to hang. From another console window I can see that system & kernel log fills from entries like these:

kern.err kernel: [ 3032.483307] pxa3xx-nand f10d0000.flash: Wait time out!!!
kern.err kernel: [ 3032.683301] pxa3xx-nand f10d0000.flash: Wait time out!!!
kern.err kernel: [ 3032.883296] pxa3xx-nand f10d0000.flash: Wait time out!!!
kern.err kernel: [ 3033.083293] pxa3xx-nand f10d0000.flash: Wait time out!!!
kern.err kernel: [ 3033.283291] pxa3xx-nand f10d0000.flash: Wait time out!!!
kern.err kernel: [ 3033.483287] pxa3xx-nand f10d0000.flash: Wait time out!!!
kern.err kernel: [ 3033.683283] pxa3xx-nand f10d0000.flash: Wait time out!!!

And the "block" process can't even be killed with "kill -9" or "kill -15".
I rebooted the router from power switch to get it killed.

With anon_mount 0 in fstab the same command works ok and produces:

root@LEDE:~# /sbin/block mount
block: /dev/ubiblock0_0 is already mounted on /rom
block: /dev/ubi0_1 is already mounted on /overlay

@cybrnook & other wrt1900ac series gurus, any experience with this?
Does anon_mount work in other wrt1900ac series devices?

1 Like

Anon mount works fine on my wrt1200 fwiw, i have noticed that boot will take a lot longer after a new mount, but it does boot eventually.

I'm so glad you have finally received yours! Maybe you can fix at least some of the issues with LEDE on this box!

Really strange.
I flashed the other partition with LEDE master with kernel 4.9, and tested fstab anon_mount there. It seemed to work there, so I switched the router to boot again from the same 17.01 firmware that failed yesterday with anon_mount. And to my surprise it works now with this 17.01 firmware, too. (The same firmware, as I have not reflashed it)

The only difference is that now the other partition has been flashed once with LEDE, as yesterday there was still the Linksys OEM firmware.

Now "/sbin/block mount" works also with anon_mount 1 active, and shows a second ubi partition. Unlike yesterday:

root@LEDE:~# /sbin/block mount
block: /dev/ubiblock0_0 is already mounted on /rom
block: /dev/ubi0_1 is already mounted on /overlay
block: /dev/ubi1_0 is already mounted on /mnt/ubi1_0

root@LEDE:~# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/root                 2560      2560         0 100% /rom
tmpfs                   256812       592    256220   0% /tmp
/dev/ubi0_1              58604      5512     50064  10% /overlay
overlayfs:/overlay       58604      5512     50064  10% /
ubi1:syscfg              71908       476     67724   1% /tmp/syscfg
tmpfs                      512         0       512   0% /dev
/dev/ubi1_0              71908       476     67724   1% /mnt/ubi1_0

It looks like the syscfg partition gets now mounted both as /tmp/syscfg and /mnt/ubi1_0

So, just a hypothesis, but possibly /sbin/block misdetects something when the other partition contains the Linksts OEM firmware. Possibly some naming or formatting is slightly different?

EDIT:
Other alternative is that the sysupgrade with settings saved (to master with 4.9) has changed the syscfg partition somehow, as I think that the mvebu sysupgrade utilises that partition for saving the settings in sysupgrade.

1 Like

I just got a brand new 3200acm as well. I can't reproduce this problem here, but maybe I misunderstand the setup. I flashed a self build lede firmware once (version 17.01.0-rc2); never ran sysupgrade on it. On the other partition I assume I still have the original Linksys firmware (I guess that's the one you are talking about, not Netgear). I have a usb stick connected to the router, booting with "anon_mount 1" works just fine.

/etc/config/fstab:

config global
option anon_swap '0'
option anon_mount '0'
option auto_swap '1'
option auto_mount '1'
option delay_root '5'
option check_fs '0'

config mount
option target '/mnt/sda2'
option uuid 'd99873c8-a42f-4f19-9f5b-68571ba47558'
option enabled '1'
option fstype 'ext3'

/sbin/block mount:

block: /dev/ubiblock0_0 is already mounted on /rom
block: /dev/ubi0_1 is already mounted on /overlay
block: /dev/sda2 is already mounted on /mnt/sda2

df:

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/root                 4864      4864         0 100% /rom
tmpfs                   256388        92    256296   0% /tmp
/dev/ubi0_1              56540       420     53196   1% /overlay
overlayfs:/overlay       56540       420     53196   1% /
ubi1:syscfg              71908       300     67900   0% /tmp/syscfg
tmpfs                      512         0       512   0% /dev
/dev/sda2               907576      1204    859436   0% /mnt/sda2

Yep, I corrected that to my message.
Too long with several different Netgear routers, that it came automatically to the text :frowning:

Could you try removing the USB stick and its mount segment from /etc/config/fstab, leaving just the "global" block with edited anon_mount 1

That would mimic my failing setup pretty closely.

You are also missing this line that appeared for me after having sysupgraded the second partition, too:
"/dev/ubi1_0 71908 476 67724 1% /mnt/ubi1_0

so your system is pretty much like mine was when it failed.

Failsafe works to get out of the problem, and you can edit fstab in failsafe (if you are familiar with that)

or you might even test it like I did: two console windows open, so that if/when one of then freezes due to "/sbin/block mount" command, you can edit the fstab in the other window to have anon_mount 0 before you reboot (using the power switch)

Ups, with the proper setup I could reproduce it as well. Helps to set "anon_mount 1".

I was playing around with the USB stick inserted, having a mount section for it, and/or having it mounted, when running "/sbin/block mount". That didn't show any difference, always the same error "pxa3xx-nand f10d0000.flash: Wait time out!!!".

Interestingly, /sbin/block info also shows the "ubi1_0" partition:

/dev/mtdblock6: TYPE="jffs2"
/dev/ubiblock0_0: UUID="643ef9c1-d0d092d2-0fa9f12b-3da3502a" VERSION="4.0" MOUNT="/rom" TYPE="squashfs"
/dev/ubi0_1: UUID="a9709d2e-35ba-4444-a544-aa10a4b4133e" VERSION="w4r0" MOUNT="/overlay" TYPE="ubifs"
/dev/ubi1_0: UUID="e16dd61c-066d-45ee-9b2b-0a92e84b06bb" VERSION="w4r0" TYPE="ubifs"
/dev/sda1: UUID="856ba000-4a72-41c5-aa1c-d374766efdf3" VERSION="1" TYPE="swap"
/dev/sda2: UUID="d99873c8-a42f-4f19-9f5b-68571ba47558" VERSION="1.0" MOUNT="/mnt/sda2" TYPE="ext3"

I don't know if the output my ubinfo -a could help:

UBI version:                    1
Count of UBI devices:           2
UBI control device major/minor: 10:59
Present UBI devices:            ubi0, ubi1

ubi   
Volumes count:                           2
Logical eraseblock size:                 126976 bytes, 124.0 KiB
Total amount of logical eraseblocks:     592 (75169792 bytes, 71.7 MiB)
Amount of available logical eraseblocks: 0 (0 bytes)
Maximum count of volumes                 128
Count of bad physical eraseblocks:       0
Count of reserved physical eraseblocks:  40
Current maximum erase counter value:     2
Minimum input/output unit size:          2048 bytes
Character device major/minor:            252:0
Present volumes:                         0, 1

Volume ID:   0 (on ubi0)
Type:        dynamic
Alignment:   1
Size:        39 LEBs (4952064 bytes, 4.7 MiB)
State:       OK
Name:        rootfs
Character device major/minor: 252:1
-----------------------------------
Volume ID:   1 (on ubi0)
Type:        dynamic
Alignment:   1
Size:        509 LEBs (64630784 bytes, 61.6 MiB)
State:       OK
Name:        rootfs_data
Character device major/minor: 252:2

===================================

ubi1
Volumes count:                           1
Logical eraseblock size:                 126976 bytes, 124.0 KiB
Total amount of logical eraseblocks:     680 (86343680 bytes, 82.3 MiB)
Amount of available logical eraseblocks: 0 (0 bytes)
Maximum count of volumes                 128
Count of bad physical eraseblocks:       8
Count of reserved physical eraseblocks:  32
Current maximum erase counter value:     2
Minimum input/output unit size:          2048 bytes
Character device major/minor:            251:0
Present volumes:                         0

Volume ID:   0 (on ubi1)
Type:        dynamic
Alignment:   1
Size:        644 LEBs (81772544 bytes, 78.0 MiB)
State:       OK
Name:        syscfg
Character device major/minor: 251:1

Edit: Also having /dev/ubi1_0 mounted with mount -t ubifs /dev/ubi1_0 /mnt/usb/ prior to running /sbin/block mount with "anon_mount 1" leads to same error.

Edit2: Also having its own mount section in /etc/config/fstab for /dev/ubi1_0 doesn't make a difference.

You might test creating a directory in /tmp/syscfg and writing a file there. Just to make LEDE to slightly manipulate the partition syscfg. (I made a copy of my settings there)

If that does not prevent the error, then the reason likely Linksys orignal firmware in the other partition. (or actually something unexpected related to its structure.)

I tested your /tmp/syscfg idea: It lead to the same error message as always.

Maybe you can flash the orignal Linksys firmware back to the other partition to see if this re-enables the bug?

I may try that.
But before I do anything else, I will add much more debugging/logging lines to "block" so that there us more clear picture where it fails.

When I get that ready, I will compile a 17.01 version and send it also to you for trying with your device. You have the error situation still intact :wink:

To me it looks that either

  • "block" calls something "mandatory" repeatedly until it succeeds, and as it does not succeed, the loop just continues
  • the flash memory driver does the same for something. It has received a mandatory call from block and fails to get a value to return.

But for both alternatives it would help great to find out where block's execution it fails.

You said that you self-compiled your rc2 firmware. Right?

If you still have your build environment intact, you might try compiling a new debug version of "block", part of the fstools package. I compiled it first for my R7800 (ipq806x) as I have that router active now, and I am currently compiling it for my WRT3200ACM. (I can then provide the binary, if you want)

There is just one patch file to be placed in a new directory package/system/fstools/patches . So, you need to create "patches" subdirectory to the fstools package and place the patch file there. The patch adds a lot of debug print lines. (I suspect that it will freeze at "waitpid starts" line.)

package/system/fstools/patches/100-add-debug.patch

then just recompile fstools:

make package/fstools/{clean,compile}

And copy the "block" executable to the router. E.g. the whole fstools package ipk to be installed with opkg, or just the "block" binary.

scp build_dir/target-arm_cortex-a9+vfpv3_musl-1.1.16_eabi/fstools-2016-12-04-84b530a7/block root@192.168.1.1:/tmp

And then run the new block:

/tmp/block mount

EDIT:

WRT3200ACM 17.01.0-rc2 "block" binary compiled:
https://www.dropbox.com/sh/sorrebbshah0pfa/AABiU6Wi_bWlIqnmwHEGEiKua?dl=0

It works in rc2 WRT3200ACM:

root@LEDE:/tmp# /tmp/block mount
block: config_load starts
block: cache_load starts
block: list_for_each_entry starts
block: mount_device for ubiblock0_0
block: mount_device for ubiblock0_0 auto_mount test
block: /dev/ubiblock0_0 is already mounted on /rom
block: mount_device for ubi0_1
block: mount_device for ubi0_1 auto_mount test
block: /dev/ubi0_1 is already mounted on /overlay
block: mount_device for ubi1_0
block: mount_device for ubi1_0 auto_mount test
block: /dev/ubi1_0 is already mounted on /mnt/ubi1_0
block: handle_swapfiles starts

The error does not get triggered for me any more, but hopefully you can test it. (So that I would not need to reinstall Linksys OEM stuff)

Like I said above, my hunch is that it gets stuck at debug line:
"exec_mount vfork default waitpid starts"

That is roughly the only place that I quickly spotted where the programs intentionally waits for something external to complete.

I compiled block myself (thanks to your detailed instructions). Here is the output of /tmp/block mount with "anon_mount 0":

block: config_load starts
block: cache_load starts
block: list_for_each_entry starts                                                                                                                                                              
block: mount_device for mtdblock6                                                                                                                                                              
block: mount_device for mtdblock6 auto_mount test                                                                                                                                              
block: mount_device for mtdblock6 extroot test                                                                                                                                                 
block: mount_device for mtdblock6 check anon_mount                                                                                                                                             
block: mount_device for mtdblock6 return from mount                                                                                                                                            
block: mount_device for ubiblock0_0                                                                                                                                                            
block: mount_device for ubiblock0_0 auto_mount test                                                                                                                                            
block: /dev/ubiblock0_0 is already mounted on /rom                                                                                                                                             
block: mount_device for ubi0_1                                                                                                                                                                 
block: mount_device for ubi0_1 auto_mount test                                                                                                                                                 
block: /dev/ubi0_1 is already mounted on /overlay                                                                                                                                              
block: mount_device for ubi1_0                                                                                                                                                                 
block: mount_device for ubi1_0 auto_mount test                                                                                                                                                 
block: mount_device for ubi1_0 extroot test                                                                                                                                                    
block: mount_device for ubi1_0 check anon_mount
block: mount_device for ubi1_0 return from mount
block: mount_device for sda1
block: mount_device for sda2
block: mount_device for sda2 auto_mount test
block: mount_device for sda2 extroot test
block: mount_device for sda2 check anon_mount
block: mount_device for sda2 return from mount
block: handle_swapfiles starts

And here with "anon_mount 1":

block: config_load starts
block: cache_load starts
block: list_for_each_entry starts
block: mount_device for mtdblock6
block: mount_device for mtdblock6 auto_mount test
block: mount_device for mtdblock6 extroot test
block: mount_device for mtdblock6 check anon_mount
block: mount_device for mtdblock6 try anon_mount
block: handle_mount starts

Thanks.
Based on that, it tries to check mtdblock6, which is one of the rootfs mtd blocks. In my system it does not get tested, but in your system with anon_mount 1 it gets also checked for anonymous mounting and it gets stuck there.

My system has this mtd structure:

root@LEDE:~# cat /proc/mtd
dev:    size   erasesize  name
mtd0: 00200000 00020000 "u-boot"
mtd1: 00020000 00020000 "u_env"
mtd2: 00040000 00020000 "s_env"
mtd3: 00040000 00020000 "devinfo"
mtd4: 001e0000 00020000 "sysdiag"
mtd5: 05000000 00020000 "kernel1"
mtd6: 04a00000 00020000 "ubi"
mtd7: 05000000 00020000 "kernel2"
mtd8: 04a00000 00020000 "rootfs2"
mtd9: 05600000 00020000 "syscfg"
mtd10: 005c0000 00020000 "unused_area"

And currently I am booting from partition 1:

root@LEDE:~# fw_printenv boot_part
boot_part=1

In my system it does not try mtdblock6, which has name ubi. (but I am not sure which boot_part your system current boots from. The corresponding mtdblock might also be mtd 9 that is named "rootfs_2" in my router)

So with anon_mount 0 it returns ok:

block: mount_device for mtdblock6 check anon_mount                                                                                                                                             
block: mount_device for mtdblock6 return from mount                                                                                                                                            
block: mount_device for ubiblock0_0                          

But with anon_mount 1 it goes to handle_mount and immediately hangs...

block: mount_device for mtdblock6 check anon_mount
block: mount_device for mtdblock6 try anon_mount
block: handle_mount starts

I will probably try to add a few more debug statements into handle_mount.

But could you run those two commands above:

  • cat /proc/mtd
  • fw_printenv boot_part

cat /proc/mtd:

dev:    size   erasesize  name
mtd0: 00200000 00020000 "u-boot"
mtd1: 00020000 00020000 "u_env"
mtd2: 00040000 00020000 "s_env"
mtd3: 00040000 00020000 "devinfo"
mtd4: 001e0000 00020000 "sysdiag"
mtd5: 05000000 00020000 "kernel1"
mtd6: 04a00000 00020000 "rootfs1"
mtd7: 05000000 00020000 "kernel2"
mtd8: 04a00000 00020000 "ubi"
mtd9: 05600000 00020000 "syscfg"
mtd10: 005c0000 00020000 "unused_area"

fw_printenv boot_part:

boot_part=2

You are booting from the other kernel, so 6 & 8 have swapped names.
But the names mirror each other, so it is likely something about the contents.
Your mtdblock6 is the "inactive rootfs"

I will look into adding debug prints.

Ok, I modified the patch.
New patch v2 is here:
https://gist.github.com/hnyman/a7321091e33b5068b8720cbd749acbe4
https://www.dropbox.com/sh/sorrebbshah0pfa/AABiU6Wi_bWlIqnmwHEGEiKua?dl=0

You can replace the previous patch with this, and then recompile.

I added more verbose debug prints before the call to "mount_handle", and then into middle of that function, although I guess that it might freeze to the first "mount" call there, so most of them will never show up.

The bedug lines include more info about the device.

In my 3200 this is the new output:

root@LEDE:~# /tmp/block mount
block: list_for_each_entry starts
block: mount_device for ubiblock0_0, type=squashfs, version=4.0,
       label=(null), uuid=c26e44d3-16d8e8a8-6d894b88-f24ca9bc
block: mount_device for ubiblock0_0 auto_mount test
block: /dev/ubiblock0_0 is already mounted on /rom

block: mount_device for ubi0_1, type=ubifs, version=w4r0,
       label=(null), uuid=88fb78ca-bae0-4dae-898e-38075204f368
block: mount_device for ubi0_1 auto_mount test
block: /dev/ubi0_1 is already mounted on /overlay

block: mount_device for ubi1_0, type=ubifs, version=w4r0,
       label=(null), uuid=17c96137-2d34-4c4f-9454-b65e98f535ef
block: mount_device for ubi1_0 auto_mount test
block: /dev/ubi1_0 is already mounted on /mnt/ubi1_0

block: list_for_each_entry ends

Although now that I have written this, I checked your message from yesterday, and there I can see that mtdblock6 is actually a jffs2 block. But let's see what the output is

EDIT:
I made a binary copy of all mtd partitions in my router after flashing LEDE successfully. So there I have the contents of one partition as Linksys left it. And when I now look those mtdblock files, I can see that my mtdblock8 has been then a pure jffs2 block, starting with "Magic bitmask" 0x1985.

But will be interesting to see how block sees that in your live system.

Thanks for the updated version of block. I removed the mount entries in my /etc/config/fstab to clear the logs, didn't make a difference anyway.

With "anon_mount 0":

root@LEDE:/tmp# /tmp/block mount
block: list_for_each_entry starts
block: mount_device for mtdblock6, type=jffs2, version=(null),
       label=(null), uuid=(null)
block: mount_device for mtdblock6 auto_mount test
block: mount_device for mtdblock6 extroot test
block: mount_device for mtdblock6 check anon_mount
block: mount_device for mtdblock6 ends
block: mount_device for ubiblock0_0, type=squashfs, version=4.0,
       label=(null), uuid=643ef9c1-d0d092d2-0fa9f12b-3da3502a
block: mount_device for ubiblock0_0 auto_mount test
block: /dev/ubiblock0_0 is already mounted on /rom

block: mount_device for ubi0_1, type=ubifs, version=w4r0,
       label=(null), uuid=a9709d2e-35ba-4444-a544-aa10a4b4133e
block: mount_device for ubi0_1 auto_mount test
block: /dev/ubi0_1 is already mounted on /overlay

block: mount_device for ubi1_0, type=ubifs, version=w4r0,
       label=(null), uuid=e16dd61c-066d-45ee-9b2b-0a92e84b06bb
block: mount_device for ubi1_0 auto_mount test
block: mount_device for ubi1_0 extroot test
block: mount_device for ubi1_0 check anon_mount
block: mount_device for ubi1_0 ends
block: list_for_each_entry ends

With "anon_mount 1":

root@LEDE:/tmp# /tmp/block mount
block: list_for_each_entry starts
block: mount_device for mtdblock6, type=jffs2, version=(null),
       label=(null), uuid=(null)
block: mount_device for mtdblock6 auto_mount test
block: mount_device for mtdblock6 extroot test
block: mount_device for mtdblock6 check anon_mount
block: mount_device for mtdblock6 try anon_mount
block: handle_mount mount source=/dev/mtdblock6, target=/mnt/mtdblock6, fstype=jffs2

Last line got me thinking. The bug can also be triggered just by running

mount -t jffs2 /dev/mtdblock6 /mnt/mtdblock6

It never finishes, doesn't give an error that /mnt/mtdblock6 doesn't exist. In the log is the usual "pxa3xx-nand f10d0000.flash: Wait time out!!!" message.

It seems to be a problem with /dev/mtdblock6, which (if I understand the partition layout correctly) is the partition where the original linksys firmware is located.

Many thanks for the debugging.

I will update that into the bug that I have opened:
https://bugs.lede-project.org/index.php?do=details&task_id=533&order=id&sort=desc

I guess that there is not much more to be done right now.

This sounds like it needs a fix from the people who have done the latest changes to fstools block and/or mvebu specialists.

1 Like