Navigation

What do you do when a path to a NFS Datastore goes down?

My QA environment uses Isilon as it’s backend storage platform. The Isilon share is one large storage volume that is setup with different paths (IPs) to the storage. This environment has 6 datastores (6 paths). The other day, we had one of those paths go down, taking down approx 75 VMs with it.

I debated attempting to remove the down datastore and then add it back with the same name (with a different IP) but didn’t know what that would do to the VMs, if they’d come back cleanly or not, etc. In the end, I decided to go down a different path…

While my storage engineer troubleshot the issue on their side, I decided to spin up another datastore, using a different IP. I then modified my script to migrate VMs between vCenters to pull a list of VMs on that specific datastore, power the VMs off, remove them from inventory on the down datastore, then re-add them back on the newly created one and power them back on. I left the “update-tools” line in there too as many of the VMs were out of date on the tools version. A few extra minutes of downtime in a QA environment won’t hurt anything.

You’ll notice a section of the script where it pauses… I did this so I could then edit the migrate.csv and change the datastore name. This was easier than taking the time to modify the code to hardcode the new datastore name. However, if you want to modify line 109 with the actual datastore name and the pause can be removed. The modified line is here: