â€¢ Given the lack of occurrence of predictive SMART signals on a large fraction of failed drives, it is unlikely that an accurate predictive failure model can be built based on these signals alone.

The observed range of AFRs [Annualized failure rates] (see Figure 2) varies from 1.7%, for drives that were in their first year of operation, to over 8.6%, observed in the 3-year old population. The higher baseline AFR for 3 and 4 year old drives is more strongly influenced by the underlying reliability of the particular models in that vintage than by disk drive aging effects. It is interesting to note that our 3-month, 6-months and 1-year data points do seem to indicate a noticeable influence of infant mortality phenomena, with 1-year AFR dropping significantly from the AFR observed in the first three months.

Failure rates are known to be highly correlated with drive models, manufacturers and vintages [18]. Our results do not contradict this fact. For example, Figure 2 changes significantly when we normalize failure rates per each drive model. Most age-related results are impacted by drive vintages. However, in this paper, we do not show a breakdown of drives per manufacturer, model, or vintage due to the proprietary nature of these data. Interestingly, this does not change our conclusions. In contrast to age-related results, we note that all results shown in the rest of the paper are not affected significantly by the population mix. None of our SMART data results change significantly when normalized by drive model. The only exception is seek error rate, which is dependent on one specific drive manufacturer...

we expected to notice a very strong and consistent correlation between high utilization and higher failure rates. However our results appear to paint a more complex picture. First, only very young and very old age groups appear to show the expected behavior. After the first year, the AFR of high utilization drives is at most moderately higher than that of low utilization drives. The three-year group in fact appears to have the opposite of the expected behavior, with low utilization drives having slightly higher failure rates than high utilization ones. One possible explanation for this behavior is the survival of the fittest theory. It is possible that the failure modes that are associated with higher utilization are more prominent early in the driveâ€™s lifetime.

...failures do not increase when the average temperature increases. In fact, there is a clear trend showing that lower temperatures are associated with higher failure rates. Only at very high temperatures is there a slight reversal of this trend. Figure 5 looks at the average temperatures for different age groups. The distributions are in sync with Figure 4 showing a mostly flat failure rate at mid-range temperatures and a modest increase at the low end of the temperature distribution. What stands out are the 3 and 4-year old drives, where the trend for higher failures with higher temperature is much more constant and also more pronounced. [Min failures at "sweet spot" of 30 - 46 deg C]

we see a drastic and quick decrease in survival probability after the first scan error (left graph). A little over 70% of the drives survive the first 8 months after their first scan error survival probability after the first reallocation. We truncate the graph to 8.5 months, due to a drastic decrease in the confidence levels after that point. In general, the left graph shows, about 85% of the drives survive past 8 months after the first reallocation. The effect is more pronounced (middle graph) for drives in the age ranges [10,20) and [20, 60] months, while newer drives in the range [0,5) months suffer more than their next generation. This could again be due to infant mortality effects, although it appears to be less drastic in this case than for scan errors. After their first reallocation, drives are over 14 times more likely to fail within 60 days than drives without reallocation counts, making the critical threshold for this parameter also one.

After the first offline reallocation, drives have over 21 times higher chances of failure within 60 days than drives without offline reallocations; an effect that is again more drastic than total reallocations.

for drives aged up to two years, this is true, there is no significant correlation between failures and high power cycles count. But for drives 3 years and older, higher power cycle counts can increase the absolute failure rate by over 2%. We believe this is due more to our population mix than to aging effects. Moreover, this correlation could be the effect (not the cause) of troubled machines that require many repair iterations and thus many power cycles to be fixed.

Vibration: This is not a parameter that is part of the SMART set, but it is one that is of general concern in designing drive enclosures as most manufacturers describe how vibration can affect both performance and reliability of disk drives. Unfortunately we do not have sensor information to measure this effect directly for drives in service.

[Failure Prediction] ...even when we add all remaining SMART parameters (except temperature) we still find that over 36% of all failed drives had zero counts on all variables. This population includes seek error rates, which we have observed to be widespread in our population (> 72% of our drives have it) which further
reduces the sample size of drives without any errors.

We conclude that it is unlikely that SMART data alone can be effectively used to build models that predict failures of individual drives. SMART parameters still appear to be useful in reasoning about the aggregate reliability of large disk populations

In our study, we did not find much correlation between failure rate and either elevated temperature or utilization. It is the most surprising result of our study. Our annualized failure rates were generally higher than those reported by vendors, and more consistent with other user experience studies.

Conclusions

We find, for example, that after their first scan error, drives are 39 times more likely to fail within 60 days than drives with no such errors. First errors in reallocations, offline reallocations, and probational counts are also strongly correlated to higher failure probabilities. Despite those strong correlations, we find that failure prediction models based on SMART parameters alone are likely to be severely limited in their prediction accuracy, given that a large fraction of our failed drives have shown no SMART error signals whatsoever.

Sorry, big summary, but interesting points there.

Regards,
Martin

[edit] Formatting [/edit]

Yes, very bad pun alert!:-b
____________
See new freedom: Mageia4Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

Spent the day trying to get the server based upon the Intel motherboard, the anonymously donated processors and RAM up and running and hit a snag. Despite looking like it would fit in our Intel chassis, it won't without significant use of a hacksaw. And since we're not sure whether Intel gave us or loaned us that chassis, I'm not ready to start sawing.

Does anyone out there have an Intel SR2300 or compatible 2U chassis (with PS)? There's a list of possible chassis here.

On the plus side, we switched fully to our new ISP, which will save us several thousand dollars a month.

The rest of the day I'll get the interface for putting the paypal donations into the database working.

Spent the day trying to get the server based upon the Intel motherboard, the anonymously donated processors and RAM up and running and hit a snag. Despite looking like it would fit in our Intel chassis, it won't without significant use of a hacksaw. And since we're not sure whether Intel gave us or loaned us that chassis, I'm not ready to start sawing.

Does anyone out there have an Intel SR2300 or compatible 2U chassis (with PS)? There's a list of possible chassis here.

On the plus side, we switched fully to our new ISP, which will save us several thousand dollars a month.

The rest of the day I'll get the interface for putting the paypal donations into the database working.

Eric

How expensive is it? Just in case nobody just has one in store?

And good you can save some money on the networking, that should ease things a bit for you. :-)

Spent the day trying to get the server based upon the Intel motherboard, the anonymously donated processors and RAM up and running and hit a snag. Despite looking like it would fit in our Intel chassis, it won't without significant use of a hacksaw. And since we're not sure whether Intel gave us or loaned us that chassis, I'm not ready to start sawing.

Does anyone out there have an Intel SR2300 or compatible 2U chassis (with PS)? There's a list of possible chassis here.

On the plus side, we switched fully to our new ISP, which will save us several thousand dollars a month.

The rest of the day I'll get the interface for putting the paypal donations into the database working.

Eric

How expensive is it? Just in case nobody just has one in store?

And good you can save some money on the networking, that should ease things a bit for you. :-)

Thank You, I called Eric with an option that might solve the problem from a company that parts out used servers... I also asked about the 2U or 1U issue... If this model will fit, it will sure beat the heck out of the cost of a new one... It would be about $600.00 with shipping

So as My Day was spent chasing other things... Now have more to research... That will teach me to sit down and read email... Hahahaha

Spent the day trying to get the server based upon the Intel motherboard, the anonymously donated processors and RAM up and running and hit a snag. Despite looking like it would fit in our Intel chassis, it won't without significant use of a hacksaw. And since we're not sure whether Intel gave us or loaned us that chassis, I'm not ready to start sawing.

Does anyone out there have an Intel SR2300 or compatible 2U chassis (with PS)? There's a list of possible chassis here.

On the plus side, we switched fully to our new ISP, which will save us several thousand dollars a month.

The rest of the day I'll get the interface for putting the paypal donations into the database working.

Eric

How expensive is it? Just in case nobody just has one in store?

And good you can save some money on the networking, that should ease things a bit for you. :-)

I truly appreciate the effort (more than most users could ever guess)!

The last piece of the puzzle to get "krytens" replacement built is the Server Case... Several Users have contacted me and they were put in touch with Eric.. Over the last week, we have the two brand new 2.8Ghz Xeons that fit the donated server board and what should be 12 gig of RAM... It was requested that names of those users not be divulged, to prevent things from happening in the Forums... That was over $1,700 of donations that will have No Star (other than Eric and the Seti Staff Knows, if you can find me and twist my arm I won't tell!). They do not want fame, they only want Seti to Survive!

I Truly Thank Everyone that has Helped! There are many "unsung heros/heroines," they just want Seti to Survive...

If enough people considered it I could risk using my PayPal for "just this"... Right now, I am stuck in the middle of my step son's $2000 car repair bill to keep him working... Otherwise I would have bought the Server Case...

So it something that I missed or "new" same company... You do not know how many Salvage Dealers I have Googled for... So the next part is to see what they stripped out to drop the price... It it truly nice to see someone else starting to look... I have more browser windows open and at the this point, need to buy more RAM... 1 gig of RAM and the OS is complaining that the 1.5 gig swap is is getting full... No wonder this machine has less than a 99% average doing Seti...

Regards

Ok Pappa I found a case, I would hope someone could buy this case, As I can't and I'd love to buy It for Seti.

With consumer PSU's the watage is usualy the overal rating on all rails. This is problematic because, in some cases you need a strong 5V rail and in most cases you need a strong 12V rail, the later is usualy the lacking one.

Whats different between general desktop PSU's and server/overclocking PSU's is that in the later there are several 12V rails and sometimes even multiple 5V rails, to help with stability.

That way you can spread load over the rails better or put certain hardware, which is sensitive to voltage fluctuations, on its own rail.

With general desktop PSU's, you usualy get stuck with 1 rail, so if you have 1 device sucking the voltage down to 11V other hardware will be hurting, badly (disks can even randomly corrupt).

IF you can afford one of these and want to ship directly to UCB, then please conctact me... I can give You the shipping address.

al.setiboinc (at) gmail.com

Why are we currently setting up a system for the Multibeam Workunits that uses drives types that are obsolete and hard to find now? What happens in 5 years and we now need drives that cannot be had for any price? What then??

It is not that is bad hardware, just that hard drives die...
"When that is all you have to work with," it is all you have...

Why are we currently setting up a system for the Multibeam Workunits that uses drives types that are obsolete and hard to find now? What happens in 5 years and we now need drives that cannot be had for any price? What then??

It is not that is bad hardware, just that hard drives die...
"When that is all you have to work with," it is all you have...

Why are we currently setting up a system for the Multibeam Workunits that uses drives types that are obsolete and hard to find now? What happens in 5 years and we now need drives that cannot be had for any price? What then??

If I may say so, I think the geek deserves a more responsive answer than that. A brief Search reveals that there are lots of *much* more expensive 73GB SCSI drives available. There are also lots available for about half the price, but they tend to have about a millisecond longer average seek time. I can make up reasons why this particular drive is specified, but I do not *know* why.

Why 73GB? Why 8? Does that include some as spares? Does SETI@home already have lots of these drives?

This is by no means a hostile response. I would really like to have a bit of "my" hardware contributing to our work and will buy one for you if you can provide sensible answers to good questions.