We are experiencing persistent I/O request timeouts on Linux with P3520/P4600 SSDs. We have tried multiple different kernels (3.10, 4.4, 4.9) and see the timeouts on all of them. The P4600 seems to be more prone to these than the P3520 though we see them on the latter as well. We have the latest firmware installed on both drives which are housed in the same machine (Supermicro 5018R-WR with X10SRW-F motherboard and E5-1650 V4 CPU). We can reproduce the timeouts by simply running mkfs -t xfs on the drive.

Here is the output from isdct (version isdct-3.0.9.400-17.x86_64):

- Intel SSD DC P3520 Series CVPF717100L01P2JGN -

Bootloader : MB1B0105

DevicePath : /dev/nvme0n1

DeviceStatus : Healthy

Firmware : MDV10271

FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

Index : 0

ModelNumber : INTEL SSDPEDMX012T7

ProductFamily : Intel SSD DC P3520 Series

SerialNumber : CVPF717100L01P2JGN

- Intel SSD DC P4600 Series BTLE736007F54P0KGN -

Bootloader : 0110

DevicePath : /dev/nvme1n1

DeviceStatus : Healthy

Firmware : QDV10150

FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

In order to further understand the issue, your system, and the troubleshooting that you have performed, could you please answer the following questions?

How many SSDs of each model are experiencing the timeout issue? How many SSDs do you have of each series?

Have you tried connecting the SSDs to different slots? Have you been able to test the drives in another motherboard or system?

Are you using any king of RAID controller?

Regarding the Intel® SSD DC P4600 Series, a new firmware version will tentatively be available within the next couple of weeks as part of the latest Intel® Solid State Drive Data Center Tool version, so please keep checking the download link https://downloadcenter.intel.com/download/27248?v=t, update your firmware and test again.I’ll be waiting for your response.Regards,Andres V.

1. We have seen this issue on at least 3 P4600s and 2 P3520s. We have a total of 8 P4600s and 4 P3520s. We are in the process of replacing the P3520s with P4600s as the workload has proven to be more write-intensive than originally anticipated.

2. We have seen the timeouts on both Supermicro (X10SRW-F) and Intel (S2600WT) systems which suggests the motherboard model is not a factor here. In our test Supermicro machine, we have the 2 SSDs installed in separate PCIe slots and both exhibit timeouts which would seem to suggest that changing the PCIe slot is not likely to resolve the issue.

3. No, we are not using any RAID controller and access the drive directly via /dev/nvme* devices.

4. We will certainly try the new firmware once it is released. If you can provide us a beta version to test sooner we would be happy to do so as well.

I powercycled the system and tried running the load again but it still reports the drive as having the latest firmware. When I first downloaded and ran isdct 3.0.10 it did report having newer firmware and successfully updated it on the drive and all commands were run as root.

Here is the version of the tool:

# isdct version

- Version Information -

Name: Intel(R) Data Center Tool

Version: 3.0.10

Description: Interact and configure Intel SSDs.

When I attempt to load the firmware now, this is the output I get from the tool:

# isdct load -intelssd 1

WARNING! You have selected to update the drives firmware!

Proceed with the update? (Y|N): Y

Updating firmware...

- Intel SSD DC P4600 Series BTLE736007F54P0KGN -

Status : The selected Intel SSD contains current firmware as of this tool release.

# isdct show -intelssd 1

- Intel SSD DC P4600 Series BTLE736007F54P0KGN -

Bootloader : 0122

DevicePath : /dev/nvme1n1

DeviceStatus : Healthy

Firmware : QDV10170

FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

There seems to be a software compatibility issue that may be causing this, because as you can see in the following image, the Intel® SSD Data Center Tool is supported for the following operating systems, and RHEL 6.9 is not one of those:

Do you have access to a PC with any of the listed operating systems? Could you please try again to install the latest firmware using the official tool?

It’s important for us to find out if version QDV10190 solves the issue you are experiencing.

RHEL 6.6 is very old (released in 2014) and we have long since upgraded our systems to 6.9 so I am unable to test on that release. I am surprised your tool releases have not kept up with vendor OS releases. Both isdct versions 3.0.9 and 3.0.10 did update the firmware to a newer release without complaint so it is not clear what the nature of the incompatibility is here since the tool itself does not print message indicating as such.

Both isdct versions 3.0.9 and 3.0.10 did update the firmware to a newer release without complaint so it is not clear what the nature of the incompatibility is here since the tool itself does not print message indicating as such.

Are you referring to an update to firmware version QDV10170 or to firmware version QDV10190? Have you been able to update the SSDs that do not show the persistent I/O request timeouts? Do you have any Intel® SSD DC P4600 with firmware version QDV10190?

I was referring to the fact that on the test machine we initially used isdct 3.0.9 to upgrade the P4600 firmware version from QDV10130 to QDV10150 and isdct 3.0.10 subsequently from QDV10150 to QDV10170. As I posted in the output above isdct 3.0.10 shows QDV10170 as the latest revision of the firmware available and states that the drive already has that revision installed on it. It does not report QDV10190 as being available. Could this be a discrepancy in the firmware revision between the documentation and the tool itself?

The P4600s we tried deploying in production have firmware QDV10130 and they all exhibit the timeouts so until this issue is resolved, these drives are unusable for us. We have had great success with your SATA SSDs (S3700, S3600, S3610, S3520) on various different versions of the OS and Linux kernels which is why we purchased their NVMe counterparts but as I now, the experience with them has been a disappointing one so we would really appreciate help in resolving the issue.