USB-SSD not waking (spinning) up quick enough, causing read/write-errors

What do you want to see ?
The booting shows no errors at all, except that the minidlna can not access the drive because that is mounted later and the minidlna then restartet.

The error itself shows different log entries.
So the actual error entry is not much helping I think.
It can be like this:
[ 1070.035054] EXT4-fs error (device sda1): __ext4_new_inode:1120: comm smbd: failed to insert inode 22020103: doubly allocated?
(writing attempt)

The only useful thing I saw was the connection between the problem (unplayable file), the error-log-entry and the fact that the previous entry in the error-log is always a wakeup-call to the drive. The time gap between the wakeup-call and the actual error line is less than 0.8sec, which I think is a little short.

The wakeup-call is logged like this:
usb 2-1: reset SuperSpeed USB device number 2 using xhci-mtk
this line always comes when the drive is accessed in sleep-mode and needs to be woken.
This actually indicates that the openwrt knows that the drive is sleeping or that it does that because the drive did not respond to a previous access-call. However it does not give the drive much time after that.
After less than 0.8sec it reports an error which can be totally different.
Only the time gap is nearly the same.

Its a command that will allow you to read/write to the drive remotely... ( without smb or dnla )

You need to isolate the drive from the services... disable minidnla... disable samba.... and try with something else once the drive is asleep.

Also set your debug level higher...

What I want to see is this:

Wouldn't it just be possible to somehow tell the openwrt to wait longer for drive responses ?
I think then just everything will be fine.

I know I read about this before when people reported a similar problem with (real) HDDs not spinning up fast enough and the system did not detect them or had slimilar reading problems.

I think it would not matter how I access the drive or which application or service I use.
I could try on the command line to do some stuff and maybe it will cause a different error entry
but the cause is still the same. Drive not responding in time.

How do I change the Log-Level ?
Maybe I can find out more with that.

98% of solving a problem is defining it....

OpenWrt is an ecosystem consisting of;

1) VENDOR HARDWARE ( internal and external )
2) VENDOR FIRMWARE ( internal and external )
3) The Linux Kernel
4) Linux Kernel Drivers ( usb / ext4 )
5) Runtime Binaries ( init / ubus / netifd )
6) Application Binaries ( samba / minidnla )
7) Userland Scripts and Utilities ( shell / hdparm / etc. )

So which part of OpenWrt did you want to "tell"?

3 Likes

At first there were some Filesystem errors which showed on booting.
I repaired these. They were most likely caused by failed writing attempts.
After that the booting was fine.

Accessing the drive showed different errors like this one:
[ 1070.035054] EXT4-fs error (device sda1): __ext4_new_inode:1120: comm smbd: failed to insert inode 22020103: doubly allocated?
or another one (which I did not copy)
there is said something like "cannot access directory sda/dlna/something"
This was a data access to a directory listing which failed. The directory was then reportet to be empty (which it actually was not)
there were also some other errors which I do not remember exactly.
(If really needed I can copy them the next time I see them)
The error is only different because I am accessing the drive differently.
Like read a file, write a file, read file listing of directory etc.
The cause is still the same, the drive does not respond in time.

As I said in another reply there are so far 2 important log lines.
1st. is a wakeup call to the drive which looks like this:
usb 2-1: reset SuperSpeed USB device number 2 using xhci-mtk
this actually is weird because the system can not know that the drive is in sleep mode.
either it does know or this line is already a (second) attempt to access the drive to maybe reset it or something because a previous attempt failed (or was not answered in time).

2nd. is the error line with whatever content which shows less than 0.8sec after the first line.
This might indicate that the drive did not respond in time.

I do not know if there was another access attempt before this usb reset attempt.
As soon as I do access data, the LED on the USB Case goes on. And then either
it works (response came quick enough) or it does not (response came too late - error is reported)

I did use the same USB case before on another router.
There this was all working well because the router did wait for the drive.
Waiting means maybe 1 or 2sec not more.

I know that its a complex structure.
The part I want to tell must be a system part which handles the access to drives or to usb devices.

Maybe we can find out which part reports the problem first.
Like "who knows first must be responsible"

Would this be possible be setting the log level higher ?
How do I do that ?

FWIW... there is a fair chance your issue is between 4 and 5...

This happens to be where hotplug resides...

Lookup some articles on hotplug and loglevels... when the drive puts itself to sleep... it should notify the kernel...

it looks as if something is broken at this stage of events. but first it's wise to exclude applications ...

2 Likes

I did have a different problem before, I installed the
opkg install block-mount
package to make the usb work and this detected the drive again as soon as it went to sleep.
(the sleep was still handled by the usb case not the system)
Anytime the case send the drive to sleep, the block mount package detected a new(?) USB device and woke it up again just a sec later.
thats why I removed this package.

Then I needed to mount the drive manually, but thats ok.
So now there is most likely no such hot plug function anymore.

1 Like

Please tell me how to do that.
So that I can collect some new data here :slight_smile:

Copy-pasting the actual error messages would have been one hundred times more informative for us, and one thousand times less time-consuming for you.

I suspect we are focusing on the drive sleeping, but I would like to explore other possibilities, starting with the USB controller going to sleep.

1 Like

I did not copy all error messages since they did not point to a specific problem.
I am a programmer since 20 years and I have seen similar things before.
When the error message is totally different each time then thats not the problem but something else is.
Thats why the error message says something like "cannot access directory" but it actually can.
Same with writing or access files.

Therefore my focus is on the usb / sleep possibility.
But since the USB Case handles that sleeping mode function itself,
I am not sure if the system gets informed of that at all.

When I still had the block mount package installed it redetected the usb drive as soon as it went to sleep.
And by that waking it up again each time it went to sleep. So no sleep possible at all.
Maybe the system is informed when sleeping mode is enabled. I don't know.

on wakeup there is an entry in the log which is:
usb 2-1: reset SuperSpeed USB device number 2 using xhci-mtk
It is possible that this is done because the system knows the drive is sleeping and wakes it up with that (but then why not wait for it to wake up...? - this could take 5-10sec or more ...)
or
This is done because the drive did not respond (usually it should within notime) and the system just tries to reset it to make it work again. After this reset usb entry in the log, the next entry is an error (of any kind) following not more than 0.8sec after it. So there is no waiting.

Still it is strange that even if the system knows that the drive is in sleep, then why not wait for it to wake up ?

Can you please copy & paste a complete log, instead of atomar snippets?

1 Like

Before that it would be good to raise the log level to see more that happens in that moment.
How do I do that ?

The more I think about this, the more I am convinced that this is a USB issue and not a SSD issue: it is not the SSD going to sleep what causes this issue, but the USB controller going to sleep.

However, you seem to have a better idea of what is happening, and a clear understanding of what info we need; I do not want to become a burden here, so I will refrain from posting here.

My best wishes.

4 Likes

Yes the Problem is somehow related to USB.
There is the USB-Port on the Router where the USB-Case is connected
The SATA-SSD-Drive in that USB-Case.

As far as I can tell, this particular USB-Case is able to send the SATA-SSD to sleep.
Or simply turns it off or something like that. This happens 1 Minute after the last access.
I have other USB-Cases which do not do that. Only a specific Hardware-Version of these Kingston USB3.0-Cases does do that.

One Thing I do remember just now: Years ago I used this external USB-Case for Backup.
And I remember that the Acronis Software from the CD I booted from, was unable to wake the drive.
It did detect it, but before I could go through the Backup Settings the drive went to sleep.
After that no more access was possible because the acronis software could not wake the drive.

So I guess now, that the openwrt is able to do that and is actually doing that.
What it does not do, is to wait for the drive to wakeup or it has some other issues with this funktion.

A well-behaved SSD with a well-behaved SSD-to-USB controller should respond almost immediately to a wake request.

That you have had problems with the combination in the past on a different OS strongly suggests that the problem lies with the drive or, more likely, the case.

Without full logs, it is impossible to diagnose further.

Also, the power savings between idle or even average operation are minimal. Assuming you're running the "budget" A400-series SSD, https://www.kingston.com/datasheets/sa400_us.pdf shows ~0.20 W idle, ~0.28 W average. 0.2 (W) * 24 (hrs/day) * 356 (days/year) / 1000 (W/kW) ~ 1.7 kWh/year -- even at US$0.30 per kWh, that's hardly worth worrying about. Similarly, it's negligible if you're running off battery/UPS compared to the router itself.

3 Likes

I also wrote that I had this combination running on my previous DLNA Server for many years without any problems. There is no problem with this drive or usb case. Its just that the device it is connected to has to support the sleeping function.

Also the power saving is about 90%, I checked it myself.

There's a package called luci-app-hd-idle that you might try if you haven't already, i don't use it, but it should let you control when your drives sleep or not (i think). Also, if your case is usb3 you might also install kmod-usb-storage-uas, i'm not too familiar with it either, but its apparently needed for some extra usb3 functionalities.

I tried some of these before.
As you can read in the Internet such packages do not work over USB.
These are made for SATA (direct connection of SATA Drive to SATA Port).
My Drive is a SATA but its connected via USB.

In this case you can not send SATA Commands to the Drive,
because it is connected via USB and USB does not support that.

I tried hd-idle, hd-parm and others before and could not get any to work.
I gues that would be easy if the Router had a SATA Port but it does not have one.
Therefore I was happy that the USB-Case did the Job.

PS: I am not even sure what the USB-Case is doing.
I know it shuts the drive of (LED goes off, Power Consumption goes down 90%) yes.
But I can not tell whether the USB-Case tells the drive to go to sleep or if it just
turns off its power. I guess it will not be possible to find that out.