I can report that your fix worked! Boots up fine with two drives even with more partitions, and no more reboot on concurrent disk access!
Thanks again!
Edit: One issue I found but I think is irrelevant with your fix is that when I edit mount points through luci, openwrt will freeze and needs a hard restart
To replicate the issue with the mount points, do the following:
install luci
go to System->Mount points
add
select /dev/sda3 by uuid
mount to /mnt/data
save and apply
The applying configuration timer times out, I am thrown out of ssh, I can still ping the device but not ssh no http!
More testing shows that this issue existed on 17.01.4 as well
Still testing your fix by the way, everything is working smoothly
Well, the good news is: I quickly found this PR for luci. It talks about this very same issue
Default behaviour of changes to fstab (Mount Points) was
to use /etc/init.d/fstab restart, however this unmounts
filesystems via block umount which can cause the device
to fail, so replace the initscript call with an exec
of 'block mount'.
The bad news: The PR has been abandoned. I mean you can manually apply the patch
and try it again. At least for the MBL Single it did the trick.
As silly as it sounds, at the moment, I don't have the two unused disks to test on my MBL Duo.
What I do have is a single spare disk and nothing short of awesome news: It seems your patch unleashed the full power of the SATA port. Where I was previously hitting a really hard limit at around 82 MB/s for reading and 27 MB/s for writing, I am now getting this:
root@OpenWrt:/mnt# time dd if=/dev/zero of=tempfile bs=1M count=1024
1024+0 records in
1024+0 records out
real 0m 13.65s
user 0m 0.01s
sys 0m 11.89s
root@OpenWrt:/mnt# time dd if=tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
real 0m 8.41s
user 0m 0.01s
sys 0m 4.70s
This means: 121 MB/s reading and 75 MB/s writing!
Seems like our collective assumption -- that the original MBL firmware's significantly better read/write performance was owed to the nonstandard 64 kB block size -- was wrong. It was the SATA driver all along.
Edit: The drive is a WD Green WD10EARX taken from an older MBL Single. I repeated the test a few times with even larger files to rule out any caching, I'm still seeing the same great performance. OpenWrt is now completely on par with the original MBL firmware's performance.
root@RED:/mnt/data# time dd if=/dev/zero of=tempfile bs=1M count=1024
1024+0 records in
1024+0 records out
real 0m 15.38s
user 0m 0.01s
sys 0m 11.19s
root@RED:/mnt/data# time dd if=tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
real 0m 6.71s
user 0m 0.00s
sys 0m 5.36s
Thank you for confirming my findings. Your results correspond to 152 MB/s reading, 66 MB/s writing (which seems a tad low, I wonder if your drive is still lazy init'ing the file system).
Maybe a bit off topic but have you managed to use raid for the system partitions? (/dev/sda1 and /dev/sda2)
I kind of liked how the official firmware functioned in that aspect.
root@NAS:/mnt/data# time dd if=/dev/zero of=tempfile bs=1M count=1024
1024+0 records in
1024+0 records out
real 0m 15.23s
user 0m 0.01s
sys 0m 12.63s
root@NAS:/mnt/data# time dd if=tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
real 0m 7.91s
user 0m 0.00s
sys 0m 6.82s
That's ~67 MB/s writing, ~130 MB/s reading. Out of curiosity:
a) the RAID is initialized, but is the ext4 filesystem too? Unless you specify -E lazy_itable_init=0,lazy_journal_init=0 when formatting, the drive will "lazy init" the inodes during regular operation for quite some time (you can see it blinking access even while not in use), and I believe that's why you are seeing slightly lower write performance than I did in my test. However, it could be the CPU overhead of RAID, too. Ultimately, drive performance is still limited by the CPU.
b) do you know/remember the performance with the original firmware?
Both of these are really not issues, though. I'm thrilled with the performance of my MBLs now.
I don't know... Is there any way to tell if the filesystem initialization is complete?
I never tested the pure disk performance this way, only through network. That was about 100 MB/s reading, 40-50 MB/s writing (on gigabit ethernet, but there were too many factors there).
I will do this test with the original firmware later today.
During the RAID rebuilding I saw speed well over 160 MB/s in /proc/mdstat. That's impressive.
Anyway, I'm happy with this throughput...
No, unfortunately the lazy initialization does not publish any stats to the system. But you can tell by the constant blips in drive activity. Anyway, it's not that horribly important.
Speed with factory firmware (and with some unnecessary service stopped, like access, avahi-daemon, netatalk, orion, etc.):
NAS:/DataVolume# time dd if=/dev/zero of=tempfile bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 10.4665 s, 103 MB/s
real 0m10.507s
user 0m0.012s
sys 0m9.676s
NAS:/DataVolume# time dd if=tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 7.25067 s, 148 MB/s
real 0m7.257s
user 0m0.008s
sys 0m5.680s