How replacing a NVME card on a S2D cluster caused me alot of hedache

A week ago i replaced a NVME card on our development Storage Spaces Direct cluster. This did not go as gracefully as i had hoped. Normaly this should work in the following way.

Pause node and drain the server for all resources

Shut down server

Replace NVME card

Reboot server

Resume server

This did not go as planned. I ended up with quite alot off issues. This was a late saturday evening. I ended up with disks that looked like this.

At this point i was worried and posted some twitter msg’s to someone at the MS S2D group that i had talked with before.

The new NVEM card was showing up fine but all the other disks was in a Lost Communication state. I tried replacing all disks and and see if that worked. It did in a way. But the virtualdisk was still degraded.

On monday i got a reply back to contact the S2D group by mail. I gave them all the info needed. And they veryfied that this was a bug. This was fixed in KB3197954 after running this on all nodes and rebooting the rebuild of the server started fine. So make sure you have the latests patches installed before doing anything on your cluster.

Now for some Storage Spaces Direct powershell commands i got from the S2D team id like to share with you all.

If you get the Lost Communication msg on a drive after a reboot or something else and you know it’s working, run this command.

PowerShell

1

Repair-ClusterS2D-RecoverUnboundDrives

To optimize your storagepool use this command, it will move the data equaly over your disks.

PowerShell

1

Optimize-StoragePool-FriendlyName'storagepoolname'-Verbose

To get som stats off your S2D cluster like, IOPS, latency, disk speed and so on, use this little command.

PowerShell

1

Get-StorageSubsystemclus*|Get-StorageHealthReport

This little bit of commands is for getting out alot of info from your S2D cluster that can be forwarded to the S2D team or for your own browsing 🙂