Sunday, 22 December 2013

Finding MTBF of One Disk to Fail Amongst an Estate of Many using PowerShell

Here I present a little script I cooked up to work out
the MTBF of one disk to fail amongst your entire Clustered ONTAP storage
estate.

The theory behind the script is slightly shaky. Pretty
much the only figure we have available to ourselves, to work out how regularly
we might expect a disk to fail in our estate, are the published MTBF figures,
and for the calculation, I’ve taken the view (probably wrong) that if one disk
has an MTBF of 2 million hours, 2 will have an MTBF of 1 million hours, and so
on …

Firstly, you’ll want to setup PowerShell to connect to
all your clusters (you might consider using my CDOT PowerShell connections
manager from this
post to do that.) Then copy the script below into notepad/notepad++ and
save as say mtbf.ps1, and load the function into PowerShell from the mtbf.ps1
script using (remember the space between dot and dot):

. .\mtbf.ps1

Finally, run the function using:

mtbf

The script scans all your clusters for various types of
disks, working out a per type disk failure rate for one disk to fail in the
estate. Then - la piece de resistance - is that it works out amongst all the
various disk types, an MTBF calculated value for one disk to fail in the entire
estate.

The Script

### START OF SCRIPT - mtbf V1.3b ###

FUNCTION mtbf {

<# The following two hashed lines contain all disk types. So
as not to waste cluster CPU cycles searching for stuff that isn't there, RECOMMEND
reducing the list by removing the disk types you know you definitely don't
have. #>

$mtbfOutput
+= "MTBF based time for one disk to fail in the entire disk estate is
$diskRecordTotalHours hours ($diskRecordTotalDays days)."

$mtbfOutput
+= " "

return $mtbfOutput}

### END OF SCRIPT ###

An Example
Output

As an example of
the script in action, the below doesn’t give an unreasonable figure for an
estate of greater than 2000 disks! In reality we’d expect a figure a fair bit
lower than given - I did say the theory behind the script was a bit shaky - as
a curiosity though, it serves its purpose.

Image: An MTBF
based calculation of 1 disk to fail amongst an estate of many!