[Solved] WD MyBook Live Duo / two disks

chunkeey · August 31, 2018, 1:09pm

Is this samba write or read performance (or both?). I know that the MBL single can do sustained writes at 23 MiB/s and sustained read at 46 MiB/s to/from a samba4 share on a standard ext4 partition. (Of course, this implies some changes to the config. (removing "option proto 'bridge'" from /etc/config/network 's lan config, stopping & disabling all unneeded services: firewall, dnsmasq, odhcpd, ) and doing some optimizations on the clients (mount share via cifs and use SMB3)

(Would be nice to hear more performance numbers.)

ticerex · August 31, 2018, 5:04pm

I'm not sure I can explain what I'm thinking of... If you put different versions on each disk, then you have to manage the boot and root partitions very carefully, or you would produce even more problems by different kernel versions.

I mean the boot process looks like this: sata1:0 (boot) --> sata0:1 (root), so the partitions should be written crosswise. In this case the official upgrade process will definitely fail.

chunkeey · August 31, 2018, 5:35pm

Ah, the sata1:0 and sata0:1 terminology is limited to u-boot, but doesn't apply to linux. To deciding factor currently is the "root=/dev/sda2" cmdline in the mbl-boot.scr script. And this is where it will/can get funky, since linux enumerates the disc in it's own way. The first fully detected disc gets to be /dev/sda, the second one /dev/sdb, etc. So even with the patch from @takimata it can still happen that if the sata 1:1 disk is slow to respond/initialize the sata 0:1 disk will win the race and get to be /dev/sda... And there's not much you can do, expect move away to PARTUUID / UUID.

takimata · September 4, 2018, 1:29pm

I don't think that's an issue though. By the time OpenWrt boots, all available disks will have been thoroughly initialized, at least two times (at least once by on-flash boot script, at least once more by the on-disk boot script), and I've observed sata init thoroughly waiting even for slow disks before handing back control to the rest of the boot script.

chunkeey · September 4, 2018, 5:45pm

Yes, Let's hope so .

(Though, the "respond/initialize" was poorly worded, it should have been just "respond". As Linux performs its own device discovery asynchronously. And sadly, it does not care that the HDD in sata 0 (if present) is supposed to be sda and likewise that the HDD in sata 1 is supposed to be sdb. )

diizzy · September 5, 2018, 8:41pm

Both read and write seems to be limited at around 20mbyte/s, top shows a lot of io load which is probably due to btrfs or possibly a wonky device driver.

As configuration goes,

-O2 instead of -Os by default (https://github.com/diizzyy/openwrt/commit/648711831477cf624c4d58e91a064de302e4813d), musl 1.1.20-pre ie your patch (https://github.com/diizzyy/openwrt/commit/99bee9e9de5360e070b29f6281922ea07cbce0e2) and musl default opt (https://github.com/diizzyy/openwrt/commit/9bd3cafe3762db05affcc5cb978e9f23b0c8713a)

Unneeded services disabled however option proto bridge is still on, I'm not very familiar with the UCI so if you could provide a template I'd be grateful.

SMB3 would probably help but most of the load is io and not really Samba-related.

chunkeey · September 6, 2018, 5:04pm

Yeah, it's the crc32c of btrfs. I formatted my test MBL with btrfs and started a samba transfer and perf.


# perf top


   PerfTop:    4533 irqs/sec  kernel:82.6%  exact:  0.0% [4000Hz cpu-clock],  (all, 1 CPU)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    26.60%  [kernel]       [k] __crc32c_le
    10.85%  [kernel]       [k] __copy_tofrom_user
     2.78%  [kernel]       [k] cpm_idle
     2.74%  [kernel]       [k] __softirqentry_text_start
     1.03%  [kernel]       [k] emac_poll_rx
     0.92%  [kernel]       [k] ata_scsi_queuecmd
     0.85%  [kernel]       [k] tcp_rcv_established
     0.62%  [kernel]       [k] __kmalloc_track_caller
     0.60%  [ip_tables]    [k] ipt_do_table
     0.59%  [kernel]       [k] __netif_receive_skb_core
     0.52%  [kernel]       [k] __wake_up_common_lock
     0.51%  [kernel]       [k] finish_task_switch
     0.45%  [kernel]       [k] kmem_cache_alloc
[...]

Oops, it's 'option type bridge' and not option proto bridge'

# uci delete network.lan.type
# uci commit

takimata · September 7, 2018, 5:29pm

Just as a heads-up: Right now I really don't have time and energy to fiddle around with git/github settings to maneuver around some red tape when submitting the patch for 18.06, so I withdrew the PR. If someone feels inclined to backport the patch (i.e., the incredibly complex task of swapping two zeroes and ones), feel free to do that.

chunkeey · September 7, 2018, 5:51pm

yeah, this is exhausting.

A little known feature of git is that you can attribute the authorship of a commit by adding a (extra)

FROM: User <email@country.cc>

tag at the top of commit message. This is usually used, if the "messenger" isn't the author, but it can also be used to fix github's idea of the commit author.

HimuraCarter · September 11, 2018, 9:57am

Once again, I kindly ask @chunkeey or @takimata to check WD Community for the latest patches by Ewald with working NCQ.
Link to Ewald's Google drive.
The most interesting patch is "mbl_sata". It has NCQ fixed driver inside. But it's for kernel 4.9.119.

chunkeey · September 20, 2018, 10:00pm

I found some "go-faster stripes" in Netgear's "premium N" WNDAP620 GPL code for the ibm EMAC ethernet driver (TSO - tested on IPv4). I put it into the staging repository for anyone who wants to see close to 70MIB/s samba read performance with an ext4 fs .

takimata · September 28, 2018, 10:33pm

Ooh, exciting. ~~I have a mightily stupid question, though: Where is that staging repository?~~ That would be the "staging" branch of your "apm82181-lede" repo?

I'm rather eager to try those improvements, in hopes that not only Samba but also rsync will speed up. I "backup" my data across a few drives and locations, and especially "cloning" a new 2 TB disk is no fun if rsync never goes beyond 20 MB/s.

diizzy · October 1, 2018, 5:52am

btrfs? It'll hardly help as CRC32c eats up most of the CPU time...

takimata · October 3, 2018, 5:15am

No, I use ext4. I have actually never looked into why rsync never went beyond ~20 MB/s, and I cannot do it right now (on the other side of the world for the next ~2 months, and I only have one MBL with me). But it seemed to be a hard limit, even with compression and everything else disabled.

diizzy · October 3, 2018, 5:39am

Try rsync without compression, it's known to slow down (limit) rsync a lot even on x86.

takimata · October 3, 2018, 6:10am

As I said, I did. No compression, not even checksum. Didn't make any significant difference, even between two MBLs connected on the same gigabit switch. I seem to remember that the MBL running the rsync daemon was hitting 100% CPU, but I'm not in the position to confirm that right now.

chunkeey · October 4, 2018, 9:26pm

Hmm, if your are set on btrfs and the NAS and PC are connected via ethernet (WLAN will not really work here) and you don't mind the extra btrfs overhead on your PC, you could go via iSCSI. OpenWrt's Package library already has the tgt package. But currently it requires a custom image, since it needs CONFIG_KERNEL_AIO which isn't enabled by default.

So you could export the btrfs partition as a iSCSI target and let the beefier PC (as the iSCSI initiator) do the heavy filesystem operations on its own.

Sadly, TSO won't help you much there, as it offloads ethernet tcp tx so it is only interesting for cases where you copy stuff from the NAS to your PC. For the use-case you are looking for we'll need to dig up some variant of "LRO" for the emac.

takimata · February 3, 2019, 12:14am

I'm digging this up again since I just upgraded my MBL to 18.06.2 and again found the CPU hit 100% at ~19MB/s transfer rate (samba in this case, although the load is not in smbd itself). I looked into the commit history and as far as I can see the TSO patches are not in master, correct?

(Background: I'm not completely unfamiliar with building from source, but my track record is very much hit and miss, so I usually resort to the ImageBuilder for my slightly customized images.)

Edit: I dug a little deeper, and upstream seems to disagree at least with part of the "net/ibm/emac: wrong bit is used for STA control" patch. I also see you are quite active on there, any new developments?

ewald · February 9, 2019, 7:10pm

@HimuraCarter,
I have moved all MyBookLive patches to GitHub, no need to look at years of blog history . Please note that there is also a modified EMAC network driver that supports hardware TSO (checksum offloading), Interrupt Coalescing, SYSFS, Mask Carrier Extension signals, jumbo packets and more. With MTU-4088, it achieves 122MB/s over netcat and 117MB/s over samba (SMB3) read (Windows 10). To get the last 5Mb/s, I had to inline a few skb calls but no change to the code. Sorry about 4.14, but I am working on 4.19. Unfortunately 4.14 never survived the standard release test used which is around 96 hours of torture test. IMHO a must for a NAS that is hosting valuable data... Rsync is a different problem though... With Debian rsync, ~25MB/s for larger files (WBL NAS to WBL NAS). I once fixed rsync to do ~70MB/s between two WBL's, but I lost that code during a torture test that corrupted the whole drive and the user space crypto API code that went with it

ewald · February 12, 2019, 11:19am

@takimata,
the STA control patch was completely bogus and correctly revoked in 4.20 as the code already took care of newer and older HW. Hence it will be required to patch 4.19 for proper operation.

if (emac_has_feature(dev, EMAC_FTR_HAS_NEW_STACR))
r = EMACX_STACR_STAC_WRITE;
else
r = EMAC_STACR_STAC_WRITE ;