Disk Average Failure Rate (AFR)

Oracle Database Tips by Donald Burleson

Doug Burns noted
this paper
on Google relating to disk failures, a very interesting study.
Unlike traditional measure of Mean Time Between Failure (MTBF) and Mean
Time to Failure (MTTF), this study uses Average Failure Rate (AFR) and
it also attempted to validate the predictive value of the SMART method (Self-Monitoring
Analysis Reporting Technology) for predicting disk failure.
Interestingly, SMART is similar to
proprietary predictive models for Oracle failures, using scientific
correlations to warn of failure before they occur.

The study claims to
be one of the largest and most comprehensive studies on disk, and it highlights
the importance of redundancy in disk technology. The paper concludes:

Heat does not matter - Hot
temperatures were not correlated to higher disk failures.

Early warnings count for predicting
disk failure - Checking the syslogs for sporadic I/O errors has high
predictive value: "After their first scan error, drives are 39 times
more likely to fail within 60 days than drives with no such errors."

SMART is not predictive - The study
noted that their SMART method (Self-Monitoring Analysis Reporting
Technology) did not provide statistically significant correlations for
predictive benefits. However, some SMART values have more predictive
value than others:

"Some SMART parameters (scan
errors, reallocation counts, offline reallocation counts, and
probational counts) have a large impact on failure probability.
Given the lack of occurrence of predictive SMART signals on a large
fraction of failed drives, it is unlikely that an accurate
predictive failure model can be built based on these signals alone."

Infant mortality - The study
suggests that disks show a form of infant mortality; "It is interesting
to note that our 3-month, 6-months and 1-year data points do seem to
indicate a noticeable influence of infant mortality phenomena, with
1-year AFR dropping significantly from the AFR observed in the first
three months.

Google study - disk failure rate and disk age

Disk Utilization factor -
The study showed that high utilization is clearly a failure factor
for young disks, and this seems similar to the old "burn-in" tests
on motherboards. While we might expect high-utilization disks
to have a higher average failure rate, the study noted that 3
year-old disks had a higher failure rate for low utilization
spindles:

You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts.

��

Burleson is the American Team

Note:This Oracle
documentation was created as a support and Oracle training reference for use by our
DBA performance tuning consulting professionals.
Feel free to ask questions on our
Oracle forum.

Verify
experience!Anyone
considering using the services of an Oracle support expert should
independently investigate their credentials and experience, and not rely on
advertisements and self-proclaimed expertise. All legitimate Oracle experts
publish
their Oracle
qualifications.

Errata? Oracle technology is changing and we
strive to update our BC Oracle support information. If you find an error
or have a suggestion for improving our content, we would appreciate your
feedback. Just e-mail: