Surprise! Overclocking your machine does make it less stable according to Microsoft

If you've ever wondered if it was worth hitting that 'Report to Microsoft' button after you experienced a BSoD then perhaps this paper from Microsoft Research will enlighten you. After studying reports from 1 million machines that suffered CPU or memory problems, Microsoft broke down all of the data into both failure types and machine types so that they can contrast the results of overclocking laptops and desktops from both major CPU vendors as well as breaking the desktops into ones assembled by a major vendor and ones assembled either by the owner or by a small business.

The basic results are easy to sum up laptops are less likely to crash than desktops, CPU errors are more likely than memory errors and underclocking will indeed make your system less prone to crashes. You are also less likely to see crashes on machines purchased from a major vendor than one assembled yourself or by a small business. Of course the whitebox versus brand name ratings cannot differentiate between someone who just built a PC for the first time and one assembled by a veteran so it is possible that that rating is a little skewed.

As for overclocking, you can see that the results are split between Vendor A and Vendor B as opposed to being labelled Intel and AMD but most readers will be able to make an intelligent guess as to which is which. TACT represents Total Accumulated CPU Time, which does not have to be contiguous and could represent quite a few weeks of ownership if the computer in question is only run for a few hours a day and then shut off. Whether this time was accumulated quickly or spaced out, it shows that overclocking either vendors chips will have a significant impact on the stability of your system. Again, there is no division into experienced overclockers and neophytes nor between those who overclock manually or with software or hardware included with the motherboard they chose. Even still the impact on stability is very large regardless of vendor and if you crash once you can be almost guaranteed to crash a second and third time. The table only focuses on the first three crashes as by the time that third crash occurs it is obvious they will continue until something is changed. Check out the abstract here or just head straight to the bottom of that page for the full PDF of results.

"Researchers working at Microsoft have analyzed the crash data sent back to Redmond from over a million PCs. You might think that research data on PC component failure rates would be abundant given how long these devices have been in-market and the sophisticated data analytics applied to the server market — but you’d be wrong. According to the authors, this study is one of the first to focus on consumer systems rather than datacenter deployments."

Your conclusion "CPU errors are more likely than memory errors" is erroneous.

After reading the paper pointed by you, I saw that the CPU errors were those signaled by a "machine-check exception", and it is likely that most errors caused by the CPU were thus accounted for.

On the other hand the memory errors counted by the study were only those that occurred in the kernel pages, which were on average only 1.5% of the total memory.

Even if not all memory errors lead to crashes, the total number of memory errors experienced by those computers was probably 50 ... 100 times than shown in the tables.

The conclusion is that memory errors are more frequent than CPU errors, but not all memory errors cause crashes, while almost all CPU errors lead to crashes. Among the crashes caused by memory or CPU errors it is unknown which are more, because the study could not identify most crashes caused by memory errors, but the numbers seem to have been of about the same order of magnitude.

I'm jealous, that is much more eloquent than what I came up with trying to express errors versus crashes when I was putting this together. I went off of the hard numbers on the tables but I should have used crashes in the conclusion as it would be the proper term.

It might be that user/client stats without demos offend me on a personal level; perhaps I could have held my tongue in a better position