Slow Samba4 write performances

Hi,

I'm running OpenWrt 19.07 on a MyBook Live device (thanks for supporting it) which has 800 MHz single core CPU, 256 MB of RAM and one 1GbE adapter.

The device itself is a NAS so not the typical OpenWrt use case, but it is supported and thanks to the outline of a user of the community, I'm making my way through having a fully operational NAS again.

So far I've managed to get a working system with partitions mounted and also managed to configure smartd for hard drive health monitoring. The hard drive itself is a brand new WD Red 4 TB.

TL;DR

After installing and configuring Samba 4, I get slow write speeds: up to 80 Mbps, while read speeds are around 190 Mbps. I've tried disabling async-io, read tops to 240 Mbps, but write remains the same. In order to exclude some kind of disk or system issue, I've tested write speeds on the NAS itself (classic dd if=/dev/zero of=/mnt/ext4partition/somefile) and the results are around 75 MB/s, so the bottleneck is not there. During write transfers CPU hardly exceeds 50% and a lot of RAM is cached (upper row in top); when no more memory is free (though available) for caching, transfers get a little slower, but not significantly.

I would like to further investigate the issue, but don't know how. Is this a known problem with Samba 4? Is there any solution?

Samba4 is slower than Samba3, setting -O2 as default (overall) optimization will help and also if you disable encryption other than that you can try the kernel module instead but it's not as mature as Samba.

Do we actually have any data on this? With smb3, samba4 should actually be much more efficient regarding network packets.

Thats odd, if samba4 CPU usage would be the actual bottleneck, we would expect near 100%?

You can try experiment with different filesystems and cluster sizes? Maybe try fat32 with 64-256k clusters, just to exclude FS as a issue or try a SSD just as baseline.

Ext4 includes various features, decreasing raw i/o speed. I.e. writing to a journal, or witing timestamps on update. So you should check the mount options , when comparing raw nas vs openwrt. Also, be shure, buffers to be flushed before end of tests.

Do you know which smb protocol version is actually used by your nas / clients?

It is more efficient if you have the processing power but on slow systems (such as MIPS or ARMv5) it's actually slower, you can tweak it somewhat but it's usually not worth your time compared to spedning like 20-30$ or so getting a SBC that's magnitudes faster.

Did we actually have some data on this? Aka test samba3 vs 4 with the same settings on the same system?
I just wonder, since the smb server core should not have be degraded in performance, they mainly added more and more features that target professional/business use cases.

PS: But yes single core 800Mhz is kinda low overall, as noted try test ksmbd and if it has the same slow speeds just get a new device.

I did notice a difference playing around with the WiTi board but I haven't tried it recently as in within a year or so but I didn't keep any numbers either way it's deprecated so it doesn't really matter in the end. There was also a difference in load on my Kirkwood device (now recycled) but USB2 was mainly the limiting factor.

Instead of immediately trying to ask for faster hw, simply evaluate the i/o load. Is it really necessary, to write time stamps on ext 4 ? Is the data really so valuable, that the journal file is required ?
You are drastically reducing the i/o-load, when disabling these features. And there are even more tuning possibilities with ext4, like "data=writeback".
All these options also save (some) CPU-cycles.

I think you misread or at least missed my point?

Wow, that's a lot of answers, thank you very much for your support.

I'm using prebuild packages from OpenWrt repositories, not sure how they are built.

I can test it more thoroughly (rather than simply looking at top while transferring :laughing:), but I'm pretty sure the bottleneck isn't there.

Sure, I can easily do that with plain files and the loopdevice trick, just let me know if that can be an issue and it's better going with actual partitions.

Unfortunately I don't have a spare SSD lying around :frowning:

Don't remember putting it when I formatted the partition, and dump.f2fs doesn't mention it.

/dev/sda3 on /mnt/data type ext4 (rw,noatime,nodiratime,data=ordered)
Not sure about that data=ordered (uci show fstab follows), but no timestamps.

fstab.@global[0]=global
fstab.@global[0].anon_swap='0'
fstab.@global[0].anon_mount='0'
fstab.@global[0].auto_swap='1'
fstab.@global[0].auto_mount='1'
fstab.@global[0].delay_root='5'
fstab.@global[0].check_fs='0'
...
fstab.@swap[0]=swap
fstab.@swap[0].enabled='1'
fstab.@swap[0].device='/dev/sda4'
fstab.@mount[2]=mount
fstab.@mount[2].target='/mnt/data'
fstab.@mount[2].device='/dev/sda3'
fstab.@mount[2].enabled='1'
fstab.@mount[2].options='rw,noatime,nodiratime'

Not sure what you refer to, the NAS runs OpenWrt.

Not really, clients are all Windows 10, latest version. I can investigate with WireShark.

I will for sure, with bot Samba 3.6 and ksmbd and post the results (with and without encryption).

Powershell: Get-SmbConnection

Ok, it took a while, but I've run all the tests.

Setup read [Mbps] write [Mbps] Dialect CPU % Notes
Samba 4 190 80 3.1.1 50
Samba 4 + Sync I/O 240 80 3.1.1 50
Samba 4 + Sync I/O + no encryption 350 150 3.1.1 50 Starts slow, gets faster after a few seconds
Samba 3.6 330 80 2.0.2 50
Samba 3.6 + Sync I/O 300 85 2.0.2 50
Samba 3.6 + Sync I/O + no encrytion 300 95 2.0.2 40-50 Higher CPU usage on read, tops at 90%
ksmbd N/A N/A N/A N/A Required dependency package kmod-fs-ksmbd is not available in any repository
ksmbd + Sync I/O N/A N/A N/A N/A Required dependency package kmod-fs-ksmbd is not available in any repository
ksmbd + Sync I/O + no encryption N/A N/A N/A N/A Required dependency package kmod-fs-ksmbd is not available in any repository

Now my question is: where can I get ksmbd to test? Is it in SNAPSHOTs only?

1 Like

I tried with the hints provided here, but I get errors, any suggestion? 128 seems to be the highest possible value to -s, so no idea how to represent 256k clusters. Maybe with bigger sectors (-S) ?

Its in snapshots + 19.07.1 as "ksmbd" and in 19.07 as "smbd", was a upstream name change.

Just upgraded to 19.07.1 (previously on 19.07.0) and found, testing right now.

Yes maybe i mixed it with the exFat cluster sizes, so could be that Fat32 has lower limits.

Try gparted or any partition tool, those usually allow the maximum possible, in practice 16/32/64 are good values, if you want reduce some overhead and mainly store larger >4kb files.

https://www.partitionwizard.com/download/v11.6-portable/11x64.zip

ksmbd has no sync I/O option and smb3 encryption is off by default, to enable it add smb3 encryption = yes to the ksmbd template in globals.

You can also force samba4 into smb-2.0.2 mode via: server max protocol = SMB2_02 for a direct comparison with samba36.

1 Like

Ok, so I can confirm Windows 10 negotiates dialect 3.1.1, writes are a little faster (around 90-100 Mbps), but every operation hangs a second before completion (as if it was missing the latest ACK) and never completes. Also, each modification I make to /etc/ksmbd/smb.conf gets lost after /etc/init.d/ksmbd restart. Modifing /etc/ksmbd/smb.conf.template works and without encryption (smb encrypt = off) I get ~160 Mbps writes on average, but numbers fluctuate a lot.

Is this still needed? I would only do that for testing purposes and use ext4 in real environment. Also, this NAS (running OpenWrt) has no USB ports, just the disk, so I would do that on plain files with loopdevices and file systems inside those files.

Thats because it is regenerated from your ksmbd UCI config file and the template, you are not supposed to directly edit the smb.conf, but either use luci or edit the template or UCI config file.