[Solved] WD MyBook Live Duo / two disks

Peeking into the detailed kernel logs, I found that during device creation the /sbin/hotplug-call script runs several times. In a successful boot these log entries are in pairs: launched/finished.

[    4.614312] procd: Launched hotplug exec instance, pid=554
...
[    4.773213] procd: Finished hotplug exec instance, pid=554

In a failed boot there is one process without "finished" entry.

[    6.410719] procd: Launched hotplug exec instance, pid=594

If anyone is interested I can attach these detailed logs, but they are ~50kb each...

hm, kinda. The x86 image uses either root=UUID (for ext4) or rootfs=squashfs to discover the rootfs on all partitions and disks (and this can be racy). On the MBLs this is currently fixed to root=/dev/sda2. So there's a slight difference there.

hm, could it be that the problem is the "concurrent access"? I.e. access to both disks at the same time? If so, you should be able to reproduce the bug in the failsafe by loading the drives on the same time... ie:

# dd if=/dev/sda of=/dev/null &
# dd if=/dev/sdb of=/dev/null &

Well, I pushed a patch to the ML:

apm821xx: attempt to fix sata access freezes. It would be somewhat comical, if all that was necessary was to set a few bits. But oh well...
Anyway, once the patch gets picked up by master, the buildbots will automatically create images.

Thank you very much for all your efforts. Can i try your patch somehow? I am willing to try it. Do i need to go through the image generation process?
I am currently on 17.01.4 and experiencing the same issues as the other forum members.

Thanks

The phase1 builder just finished the process of making an -snapshot / -master image.

http://phase1.builds.lede-project.org/builders/apm821xx%2Fsata/builds/911

you can get the image from http://downloads.lede-project.org/snapshots/targets/apm821xx/sata/
(make sure the date is 16th July! in case you hit some sort of cache).

Oh, make no mistake. It's an "attempt". See apm821xx: attempt to fix sata access freezes.

so, it will depend on whenever it magically fixes the problem or not. If it does then it will get ported to all the active releases [17.01.6 (if there is such a release) and 18.06].

Hello,

I can report that your fix worked! Boots up fine with two drives even with more partitions, and no more reboot on concurrent disk access!

Thanks again!

Edit: One issue I found but I think is irrelevant with your fix is that when I edit mount points through luci, openwrt will freeze and needs a hard restart

1 Like

Well, I was about to say, that it would be nice to have some "Tested-by: " tags. But...

Ok, do you think you can write down what you did? I can certainly give it a try on the MBL Single too.

To replicate the issue with the mount points, do the following:
install luci
go to System->Mount points
add
select /dev/sda3 by uuid
mount to /mnt/data
save and apply

The applying configuration timer times out, I am thrown out of ssh, I can still ping the device but not ssh no http!
More testing shows that this issue existed on 17.01.4 as well

Still testing your fix by the way, everything is working smoothly

1 Like

Oh boy.

Well, the good news is: I quickly found this PR for luci. It talks about this very same issue

Default behaviour of changes to fstab (Mount Points) was
to use /etc/init.d/fstab restart, however this unmounts
filesystems via block umount which can cause the device
to fail, so replace the initscript call with an exec
of 'block mount'.

The bad news: The PR has been abandoned. I mean you can manually apply the patch

and try it again. At least for the MBL Single it did the trick.

You are correct once again! I made the change and it worked

Thanks

As silly as it sounds, at the moment, I don't have the two unused disks to test on my MBL Duo.

What I do have is a single spare disk and nothing short of awesome news: It seems your patch unleashed the full power of the SATA port. Where I was previously hitting a really hard limit at around 82 MB/s for reading and 27 MB/s for writing, I am now getting this:

root@OpenWrt:/mnt# time dd if=/dev/zero of=tempfile bs=1M count=1024
1024+0 records in
1024+0 records out
real    0m 13.65s
user    0m 0.01s
sys     0m 11.89s

root@OpenWrt:/mnt# time dd if=tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
real    0m 8.41s
user    0m 0.01s
sys     0m 4.70s

This means: 121 MB/s reading and 75 MB/s writing!

Seems like our collective assumption -- that the original MBL firmware's significantly better read/write performance was owed to the nonstandard 64 kB block size -- was wrong. It was the SATA driver all along.

Edit: The drive is a WD Green WD10EARX taken from an older MBL Single. I repeated the test a few times with even larger files to rule out any caching, I'm still seeing the same great performance. OpenWrt is now completely on par with the original MBL firmware's performance.

well, I left a comment about it:

you can maybe do the same.

well, I can add that to the 18.06 commit message.

1 Like
root@RED:/mnt/data# time dd if=/dev/zero of=tempfile bs=1M count=1024
1024+0 records in
1024+0 records out
real    0m 15.38s
user    0m 0.01s
sys     0m 11.19s

root@RED:/mnt/data# time dd if=tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
real    0m 6.71s
user    0m 0.00s
sys     0m 5.36s

Thank you for confirming my findings. Your results correspond to 152 MB/s reading, 66 MB/s writing (which seems a tad low, I wonder if your drive is still lazy init'ing the file system).

In any case, I posted the patches to 18.06 and 17.01 (no idea if there will be a 17.01.6 or not) with a updated commit message.

https://patchwork.ozlabs.org/project/openwrt/list/?submitter=72473

Also "discord being discord":

Let others join the conversation

This topic is clearly important to you – you've posted more than 20% of the replies here.

Are you sure you’re providing adequate time for other people to share their points of view, too?

1 Like

Thank you all!

I'm sorry, I have no physical access to my MBLD during my holiday, hopefully I can start testing in the coming days.

Just one question: will this patch be included in the final 18.06 release? I prefer using stable releases on my "production" devices.

That is the plan, yes:

Unless there are legitimate objections I expect them to slip in before the release of 18.06 proper.

1 Like

Everything is working smoothly with snapshot build so far... disks, partitions, raid, samba...

However, I cannot find the nfs-kernel-server package. Is there any specific reason for left it out?

Can I use the rc1/rc2 packages instead?
http://downloads.openwrt.org/snapshots/packages/powerpc_464fp/packages/
http://downloads.openwrt.org/releases/18.06.0-rc2/packages/powerpc_464fp/packages/

Edit: it's working perfectly with the rc2 package. :slight_smile:

Maybe a bit off topic but have you managed to use raid for the system partitions? (/dev/sda1 and /dev/sda2)
I kind of liked how the official firmware functioned in that aspect.