Nfs stop when writting

Hi,
Server: Linksys WRT32X OpenWrt 19.07.3 with USB HDD
Client: Opensuse 15.1
Wired connection
In a previous thread about nfs here, I had some difficulties to run nfs without outage. Very often when I was writing my HDD through nfs, it stopped and freeze the client file manager Caja/Nautilus. I had to kill the process and had error in the HDD (fsck, superblock).
For the last 3 days, it was a lot better, no issue.
Today, I started to fill my HDD with data. I made 7 transfers with command line

time rsync -r /path/to/dir /mnt/HD5/media/to/dir; sync

Total around 280GB+144GB+25GB+130GB+250GB+26GB+5GB without any issue.
I was running again the command to transfer 240GB more and it stopped:
HDD LED stopped
Server network LED steady
Client command line waiting, I can't enter command
Client GUI file manager Caja with hourglass
Client dmesg

[19277.145775] nfs: server 192.168.1.1 not responding, still trying
[19278.170210] nfs: server 192.168.1.1 not responding, still trying

Server dmesg with ssh

[16776.980582] ieee80211 phy1: staid 1 deleted
[18044.331121] ieee80211 phy1: staid 1 deleted
[19052.914970] sd 2:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[19052.923195] sd 2:0:0:0: [sda] tag#0 Sense Key : 0x5 [current] 
[19052.929054] sd 2:0:0:0: [sda] tag#0 ASC=0x21 ASCQ=0x0 
[19052.934223] sd 2:0:0:0: [sda] tag#0 CDB: opcode=0x2a 2a 00 00 29 ad 60 00 01 00 00
[19052.941828] print_req_error: critical target error, dev sda, sector 21850880
[19052.948916] EXT4-fs warning (device sda1): ext4_end_bio:323: I/O error 5 writing to inode 3609 (offset 43384832 size 6946816 starting block 2731616)
[19052.962279] Buffer I/O error on device sda1, logical block 2731104
[19052.968492] Buffer I/O error on device sda1, logical block 2731105
[19052.974700] Buffer I/O error on device sda1, logical block 2731106
[19052.980906] Buffer I/O error on device sda1, logical block 2731107
[19052.987116] Buffer I/O error on device sda1, logical block 2731108
[19052.993326] Buffer I/O error on device sda1, logical block 2731109
[19052.999533] Buffer I/O error on device sda1, logical block 2731110
[19053.005743] Buffer I/O error on device sda1, logical block 2731111
[19053.011949] Buffer I/O error on device sda1, logical block 2731112
[19053.018158] Buffer I/O error on device sda1, logical block 2731113
[19057.656691] JBD2: Detected IO errors while flushing file data on sda1-8
[19417.811721] sd 2:0:0:0: timing out command, waited 360s
[19417.816978] sd 2:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[19417.825211] sd 2:0:0:0: [sda] tag#0 Sense Key : 0x4 [current] 
[19417.831073] sd 2:0:0:0: [sda] tag#0 ASC=0x44 ASCQ=0x0 
[19417.836247] sd 2:0:0:0: [sda] tag#0 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00
[19417.843862] print_req_error: I/O error, dev sda, sector 3905672064
[19417.850092] Aborting journal on device sda1-8.
[19778.000254] sd 2:0:0:0: timing out command, waited 360s
[19778.005513] sd 2:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[19778.013739] sd 2:0:0:0: [sda] tag#0 Sense Key : 0x4 [current] 
[19778.019598] sd 2:0:0:0: [sda] tag#0 ASC=0x44 ASCQ=0x0 
[19778.024767] sd 2:0:0:0: [sda] tag#0 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00
[19778.032385] print_req_error: I/O error, dev sda, sector 3905161216
[19778.038593] Buffer I/O error on dev sda1, logical block 488144896, lost sync page write
[19778.046687] JBD2: Error -5 detected when updating journal superblock for sda1-8.
[19778.255214] EXT4-fs (sda1): Delayed block allocation failed for inode 3610 at logical offset 10240 with max blocks 2048 with error 30
[19778.267277] EXT4-fs (sda1): This should not happen!! Data will be lost
[19778.267277] 
[19778.275347] EXT4-fs error (device sda1) in ext4_writepages:2925: Journal has aborted
[20138.358912] sd 2:0:0:0: timing out command, waited 360s
[20138.364171] sd 2:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[20138.372395] sd 2:0:0:0: [sda] tag#0 Sense Key : 0x4 [current] 
[20138.378255] sd 2:0:0:0: [sda] tag#0 ASC=0x44 ASCQ=0x0 
[20138.383426] sd 2:0:0:0: [sda] tag#0 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00
[20138.391042] print_req_error: I/O error, dev sda, sector 2048
[20138.396727] Buffer I/O error on dev sda1, logical block 0, lost sync page write
[20138.404076] EXT4-fs error (device sda1): ext4_journal_check_start:61: Detected aborted journal
[20138.412781] EXT4-fs (sda1): Remounting filesystem read-only
[20138.418389] EXT4-fs (sda1): previous I/O error to superblock detected
[20498.574796] sd 2:0:0:0: timing out command, waited 360s
[20498.580056] sd 2:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[20498.588290] sd 2:0:0:0: [sda] tag#0 Sense Key : 0x4 [current] 
[20498.594151] sd 2:0:0:0: [sda] tag#0 ASC=0x44 ASCQ=0x0 
[20498.599322] sd 2:0:0:0: [sda] tag#0 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00
[20498.606941] print_req_error: I/O error, dev sda, sector 2048
[20498.612627] Buffer I/O error on dev sda1, logical block 0, lost sync page write
[20498.620021] nfsd: peername failed (err 107)!
[20498.624493] EXT4-fs (sda1): previous I/O error to superblock detected
[20498.631045] nfsd: peername failed (err 107)!
[20498.635722] nfsd: peername failed (err 107)!
[20498.640125] nfsd: peername failed (err 107)!
[20498.644479] nfsd: peername failed (err 107)!

After about 10 more mn waiting I have a return in the client command

rsync: [sender] write error: Broken pipe (32)
rsync: close failed on "/mnt/HD5/media/path/to/file QzrK2s": Read-only file system (30)
rsync error: error in file IO (code 11) at receiver.c(853) [receiver=3.1.3]

It looks like an HDD issue, could you confirm?
Any idea why it happen? Due to nfs?
And how to fix permanently?

Edit:
With file manager (Caja) HDD is read-only but dir and files have nobody: create and delete, nogroup .
I can't delete the last directory
Last info in server dmesg

[20498.640125] nfsd: peername failed (err 107)!
[20498.644479] nfsd: peername failed (err 107)!
[20858.788268] sd 2:0:0:0: timing out command, waited 360s
[20858.793527] sd 2:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[20858.801752] sd 2:0:0:0: [sda] tag#0 Sense Key : 0x4 [current] 
[20858.807612] sd 2:0:0:0: [sda] tag#0 ASC=0x44 ASCQ=0x0 
[20858.812778] sd 2:0:0:0: [sda] tag#0 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00
[20858.820393] print_req_error: I/O error, dev sda, sector 2048
[20858.826079] Buffer I/O error on dev sda1, logical block 0, lost sync page write
[20859.123492] JBD2: Detected IO errors while flushing file data on sda1-8

In the past I had similar problems with USB3 disks connected to an AMD64-PC, that use the UAS driver. The culprit is the SATA2USB chip in the enclosure. Some are broken by design.

In that case you have to avoid the usb-uas-driver and instead force the use of the usb-storage driver, which is always used with USB2.

With lsusb -t you can see, which driver an USB device uses.

You can enforce the use of the usb-storage driver with an entry on the kernel command line:

usb-storage.quirks=174c:55aa:u,152d:0567:u

In this case the SATA2USB3 chips with the IDs 174c:55aa and 152d:0567 are marked as "uas-dirty" with the consequence, that these devices are handled by the usb-storage driver.

Your transfer speed will be lowered by a few percent, but the transfers are stable and rock-solid.

1 Like

Thank you Barney

lsusb -t
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=vhci_hcd/8p, 5000M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=vhci_hcd/8p, 480M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=orion-ehci/1p, 480M

I read "Driver=usb-storage,"
Any idea?

Okay, the usb-storage driver is already in use.

I suppose, it's not a NFS problem, but an USB connection problem.

In order to evaluate your case, please answer the following questions:

  1. has your USB drive its own power supply?
  2. is your USB drive connected to an USB2 port or an USB3 port of your router?
  3. is your USB drive 2.5" or 3.5"?
  4. which USB modules are loaded?
1 Like

Let me know


Edit 19/07/2020
After router reboot:
When I try to ssh router I get message:

@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
...
The fingerprint for the RSA key sent...

The USB disk is not auto mounted and I can't mount it manually.


I got the answer for the above.
For some reason, router rebooted on the second partition with OpenWrt 19.07.2, old one.

Then I tried deliberately to boot on OpenWRT 19.07.3. After few try the router started the old 19.07.2.
The 19.07.3 is broken. Very big issue happened, I have to reinstall it.

Edit:
Disconnected HDD and reboot on 19.7.3 worked. No need to reinstall but only fix the issue.

I see in the TOH, that the WRT32X has an USB2 and an USB3 port. You got the errors with the drive connected to the USB3 port. Do the errors also appear, when you connect the drive to the USB2 port of the WRT32X?

1 Like

As the USB2 is connected to a printer, I never used it with HDD.
I will try but it could be long to get a reliable answer working or not as the error came after almost 1TB write.

It's just for testing. A good USB2 connection will deliver a speed of about 30 MByte/sec. So to transfer 1 TB you'll need about 10 hours.

If you get the same errors, the harddrive itself could be the culprit or a faulty power supply. Otherwise (USB2 okay, USB3 error) the USB3 port and your USB3 harddrive don't play nicely together. In that case I would substitute your harddrive by a thumb drive or another harddrive.

1 Like

HDD connected to USB2, I tried to mount from CLI or GUI or reboot: No way
From GUI, HDD to USB3, I can mount, umount (when nfsd stopped). When connected to USB2, it is not possible.
I create another mount point and tried again. No more success.

HDD is not detected connected to USB2 .

If the HDD is not detected, it's not accessable and therefore can't be mounted. So let's find out the reason, why it's not detected.

Please post the output of the following command:
opkg list-installed | grep ^kmod-usb | sort

Perhaps there are some USB related packages missing.

~# opkg list-installed | grep ^kmod-usb | sort
kmod-usb-core - 4.14.180-1
kmod-usb-ehci - 4.14.180-1
kmod-usb-ohci - 4.14.180-1
kmod-usb-printer - 4.14.180-1
kmod-usb-storage - 4.14.180-1
kmod-usb-storage-extras - 4.14.180-1
kmod-usb2 - 4.14.180-1
kmod-usb3 - 4.14.180-1
kmod-usbip - 4.14.180-1
kmod-usbip-client - 4.14.180-1
kmod-usbip-server - 4.14.180-1

These packages should be sufficient for operating an USB harddisk.

What does the logfile say, when you connect the harddrive with the router's USB2 port? Please post the output of the logread command, starting with the timestamp of connecting the harddrive.

ls /dev, stop nfs service, disconnect HDD from USB3 and reconnect to USB2

~# ls /dev/sd*
/dev/sda   /dev/sda1

~# readlog
....
Wed Jul 22 11:29:22 2020 daemon.notice rpc.mountd[3344]: Caught signal 15, un-registering and exiting.
Wed Jul 22 11:29:22 2020 kern.warn kernel: [79091.224996] nfsd: last server has exited, flushing export cache
Wed Jul 22 11:30:01 2020 kern.info kernel: [79129.866638] usb 3-1: USB disconnect, device number 2
Wed Jul 22 11:30:01 2020 kern.notice kernel: [79129.908551] sd 2:0:0:0: [sda] Synchronizing SCSI cache
Wed Jul 22 11:30:01 2020 kern.info kernel: [79129.913918] sd 2:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00
Wed Jul 22 11:30:24 2020 kern.info kernel: [79152.936392] usb 1-1: new high-speed USB device number 2 using orion-ehci

~# ls /dev/sd*
ls: /dev/sd*: No such file or directory

There is something wrong with your USB3 harddrive. It is recognized as USB device, but not as USB storage device.

Each USB3 device should be downward compatible with USB2. It means: an USB3 device connected to an USB2 port should function the same way as connected to an USB3 port, just only with a lower speed.

Your USB3 harddrive doesn't. The reason can be:

  1. a defective USB2 port
  2. a defective USB3 harddrive
  3. both of them

To check your USB ports, connect an USB thumb drive or another USB harddrive with the USB ports. If the devices are recognized as USB storage devices, the USB ports are okay. As a consequence you can concentrate on your USB3 harddrive for the next investigation steps.

1 Like

Another, very similar USB3 disk/enclosure with another power supply connected to router USB3 port

Thu Jul 23 08:48:10 2020 kern.info kernel: [  451.162182] usb 3-1: new SuperSpeed USB device number 2 using xhci-hcd
Thu Jul 23 08:48:10 2020 kern.warn kernel: [  451.193217] usb 3-1: USB controller f10f8000.usb3 does not support streams, which are required by the UAS driver.
Thu Jul 23 08:48:10 2020 kern.warn kernel: [  451.203551] usb 3-1: Please try an other USB controller if you wish to use UAS.
Thu Jul 23 08:48:10 2020 kern.info kernel: [  451.210970] usb-storage 3-1:1.0: USB Mass Storage device detected
Thu Jul 23 08:48:10 2020 kern.info kernel: [  451.217362] usb-storage 3-1:1.0: Quirks match for vid 152d pid 0578: 1000000
Thu Jul 23 08:48:10 2020 kern.info kernel: [  451.224500] scsi host2: usb-storage 3-1:1.0
Thu Jul 23 08:48:11 2020 kern.notice kernel: [  452.262806] scsi 2:0:0:0: Direct-Access     JMicron  Tech             0203 PQ: 0 ANSI: 6
Thu Jul 23 08:48:18 2020 kern.notice kernel: [  459.501042] sd 2:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
Thu Jul 23 08:48:18 2020 kern.notice kernel: [  459.508578] sd 2:0:0:0: [sda] 5860533168 512-byte logical blocks: (3.00 TB/2.73 TiB)
Thu Jul 23 08:48:18 2020 kern.notice kernel: [  459.516376] sd 2:0:0:0: [sda] 4096-byte physical blocks
Thu Jul 23 08:48:18 2020 kern.notice kernel: [  459.521883] sd 2:0:0:0: [sda] Write Protect is off
Thu Jul 23 08:48:18 2020 kern.debug kernel: [  459.526720] sd 2:0:0:0: [sda] Mode Sense: 47 00 00 08
Thu Jul 23 08:48:18 2020 kern.notice kernel: [  459.532071] sd 2:0:0:0: [sda] Disabling FUA
Thu Jul 23 08:48:18 2020 kern.notice kernel: [  459.536290] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Thu Jul 23 08:48:18 2020 kern.info kernel: [  459.692539]  sda: sda1
Thu Jul 23 08:48:18 2020 kern.notice kernel: [  459.695882] sd 2:0:0:0: [sda] Attached SCSI disk

~# ls /dev/sd*
/dev/sda   /dev/sda1

Disconnect

Thu Jul 23 08:53:29 2020 kern.info kernel: [  770.918209] usb 3-1: USB disconnect, device number 2
Thu Jul 23 08:53:29 2020 kern.notice kernel: [  770.956810] sd 2:0:0:0: [sda] Synchronizing SCSI cache
Thu Jul 23 08:53:29 2020 kern.info kernel: [  770.962102] sd 2:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00

Now connected to router USB2 port then few times disconnect/reconnect

Thu Jul 23 08:55:56 2020 kern.info kernel: [  917.772797] usb 1-1: new high-speed USB device number 3 using orion-ehci
Thu Jul 23 08:56:24 2020 kern.info kernel: [  945.161643] usb 1-1: USB disconnect, device number 3
Thu Jul 23 08:56:31 2020 kern.info kernel: [  952.185859] usb 1-1: new high-speed USB device number 4 using orion-ehci
Thu Jul 23 08:57:25 2020 kern.info kernel: [ 1006.465158] usb 1-1: USB disconnect, device number 4
Thu Jul 23 08:58:13 2020 kern.info kernel: [ 1054.166317] usb 1-1: new high-speed USB device number 5 using orion-ehci
Thu Jul 23 08:58:49 2020 kern.info kernel: [ 1090.335983] usb 1-1: USB disconnect, device number 5
Thu Jul 23 08:59:35 2020 kern.info kernel: [ 1136.305867] usb 1-1: new high-speed USB device number 6 using orion-ehci
Thu Jul 23 09:02:38 2020 kern.info kernel: [ 1319.682969] usb 1-1: USB disconnect, device number 6

However, a 2.5" USB3 connected to USB3 then to USB2 is mounted fine

Thu Jul 23 09:07:25 2020 kern.info kernel: [ 1606.406271] usb 3-1: new SuperSpeed USB device number 4 using xhci-hcd
Thu Jul 23 09:07:25 2020 kern.warn kernel: [ 1606.437118] usb 3-1: UAS is blacklisted for this device, using usb-storage instead
Thu Jul 23 09:07:25 2020 kern.info kernel: [ 1606.444742] usb-storage 3-1:1.0: USB Mass Storage device detected
Thu Jul 23 09:07:25 2020 kern.info kernel: [ 1606.451100] usb-storage 3-1:1.0: Quirks match for vid 357d pid 7788: 4800000
Thu Jul 23 09:07:25 2020 kern.info kernel: [ 1606.458264] scsi host2: usb-storage 3-1:1.0
Thu Jul 23 09:07:26 2020 kern.notice kernel: [ 1607.486966] scsi 2:0:0:0: Direct-Access     WDC WD50 00BEVT-00A0RT0   0100 PQ: 0 ANSI: 6
Thu Jul 23 09:07:26 2020 kern.notice kernel: [ 1607.495648] sd 2:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/466 GiB)
Thu Jul 23 09:07:26 2020 kern.notice kernel: [ 1607.503624] sd 2:0:0:0: [sda] Write Protect is off
Thu Jul 23 09:07:26 2020 kern.debug kernel: [ 1607.508521] sd 2:0:0:0: [sda] Mode Sense: 67 00 10 08
Thu Jul 23 09:07:26 2020 kern.notice kernel: [ 1607.513922] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
Thu Jul 23 09:07:26 2020 kern.info kernel: [ 1607.539786]  sda: sda1
Thu Jul 23 09:07:26 2020 kern.notice kernel: [ 1607.543103] sd 2:0:0:0: [sda] Attached SCSI disk
Thu Jul 23 09:09:06 2020 kern.info kernel: [ 1708.080269] usb 3-1: USB disconnect, device number 4
Thu Jul 23 09:09:06 2020 kern.notice kernel: [ 1708.122434] sd 2:0:0:0: [sda] Synchronizing SCSI cache
Thu Jul 23 09:09:06 2020 kern.info kernel: [ 1708.127664] sd 2:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00
Thu Jul 23 09:09:14 2020 kern.info kernel: [ 1715.474639] usb 1-1: new high-speed USB device number 9 using orion-ehci
Thu Jul 23 09:09:14 2020 kern.warn kernel: [ 1715.676359] usb 1-1: UAS is blacklisted for this device, using usb-storage instead
Thu Jul 23 09:09:14 2020 kern.info kernel: [ 1715.683996] usb-storage 1-1:1.0: USB Mass Storage device detected
Thu Jul 23 09:09:14 2020 kern.info kernel: [ 1715.690236] usb-storage 1-1:1.0: Quirks match for vid 357d pid 7788: 4800000
Thu Jul 23 09:09:14 2020 kern.info kernel: [ 1715.697370] scsi host2: usb-storage 1-1:1.0
Thu Jul 23 09:09:15 2020 kern.notice kernel: [ 1716.725587] scsi 2:0:0:0: Direct-Access     WDC WD50 00BEVT-00A0RT0   0100 PQ: 0 ANSI: 6
Thu Jul 23 09:09:15 2020 kern.notice kernel: [ 1716.734685] sd 2:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/466 GiB)
Thu Jul 23 09:09:15 2020 kern.notice kernel: [ 1716.742810] sd 2:0:0:0: [sda] Write Protect is off
Thu Jul 23 09:09:15 2020 kern.debug kernel: [ 1716.747685] sd 2:0:0:0: [sda] Mode Sense: 67 00 10 08
Thu Jul 23 09:09:15 2020 kern.notice kernel: [ 1716.753313] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
Thu Jul 23 09:09:15 2020 kern.info kernel: [ 1716.775204]  sda: sda1
Thu Jul 23 09:09:15 2020 kern.notice kernel: [ 1716.780451] sd 2:0:0:0: [sda] Attached SCSI disk
Thu Jul 23 09:10:37 2020 kern.info kernel: [ 1798.612314] usb 1-1: USB disconnect, device number 9
Thu Jul 23 09:10:37 2020 kern.notice kernel: [ 1798.651934] sd 2:0:0:0: [sda] Synchronizing SCSI cache
Thu Jul 23 09:10:37 2020 kern.info kernel: [ 1798.657387] sd 2:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00

Router or disk defective? Hard to say!

The logs of your 2.5" harddrive look fine: that's the normal behaviour of an USB storage device. And: the USB ports of your WRT32X seem to be okay.

The logs of your 3.5" harddrives are problematic. Ideally they should look like the logs for the 2.5" harddrive.

The JMicron SATA2USB3 chip (USB-Id 152d:0578) is cheap, widespread and awfull. Don't use it with UAS. Google for the USB-Id and you'll find a lot of reports. Perhaps your first 3.5" harddrive uses the same chip.

Last resort: remove the package kmod-usb-storage-extras from your router. You must reboot your router to make the change effective.

Otherwise there is my hint in a previous post:

But I don't know, whether there is a place in the WRT32X, where you can manipulate the kernel command line. Perhaps somebody else can jump in.

Your NFS problem is a consequence of your USB problem. Solve the latter and the first will vanish.

I removed it. I'll see with time if this fix the issue.
Thank you for your help

Edit:
Could you let me know if this is normal that from now openWrt server run only nfs v3. It was running v4 as well previously.

Sorry, I can't answer your question. Perhaps somebody else can jump in.

I configured all my NFS servers and clients as v3, therefore having no experiences with NFS v4.