SSDs are like a bad drug. They're the best improvement one can make to a PC, but...

My 4th OCZ SSD failed today at a remote site. With 4 out of 13 drives failing after a few months of use, that puts the failure rate at 30%. Is this an anomaly? Am I just being paranoid about the life expectancy of the other SSDs I've got in the field? At the end of this post I'll refer you to an article at codinghorrors.com that really made my heart sink.

The first three just died without warning. They simply disappeared from BIOS one morning upon boot-up. The fourth started hanging up the system, even after booting the system from another drive. First Windows Explorer would stop working, then the mouse, then total freeze -- maybe 3 minutes in. We tried saving a few files from the SSD after connecting it to a lab system via a USB dock but after about 10 files it would freeze and require power-cycling.

The cost to me after each failure is that I must send someone onsite to replace the drive with a new one, restore the new drive from an image made when the system was first delivered, update Windows, drivers and software, and install any drivers or reconfigure to adapt to any changes since the image was made. That's a costly loss, plus I pay for shipping and weeks later end up with a replacement drive that I frankly don't trust.

So what is OCZ's reply upon my asking for more than just another costly RMA?

Please avoid cloning operating system images onto the drive as this has been known to cause issues similar to what you've been experiencing. A fresh install works best.

Okay, so they're saying their SSDs cannot accept images, which all PC manufacturers use to build systems and most enterprise IT departments use to deploy them. Are SSDs solely for individual end-users to do new bare-metal installs on?

I wrote back asking if an SSD buyer is supposed to update the drive's firmware before installation. I also asked if the owner should continue updating it when new firmware is released. Since restoring from an image is not an option, this means the owner must do another bare-metal install after every firmware update, right?

When does one know that their SSD's firmware finally has the bugs worked out so it won't catastrophically fail? This would be nice to know given that a bare-metal install is required after each firmware update.

Now maybe the "secure erase" is optional, but at the link they provided it says:

"This bootable set of tools will secure erase as well as update the firmware of your SSD. "

Despite two documented assertions by OCZ that erasure is part of the deal, I'd have to try this bootable set of tools myself to know for sure. But all my working drives are in the field being used daily so I can't try their tools.

The replacement SSD's I've been buying have not been OCZs. (Would you have replaced them with OCZ SSDs?) I hoped that by transitioning to another brand I can avoid repeat failures. Then I found this post at codinghorror.com. The author relates the story of a friend who bought 8 SSDs over the last two years and all failed. He used a variety of brands.

So what's going to happen with all the systems that are being sold or upgraded with SSDs? Are they all being hand-built with a manual OS install?

Are the SSD owners periodically updating the firmware on their SSDs, doing a clean bare-metal install every time?

At the moment, all I can say is "backup that SSD C: drive religiously." And enjoy the crack-SSD as long as you can!

Update from OCZ tech support

OCZ replied today and it's interesting what they wrote.

Firmware updates should only be applied if the drive is malfunctioning similar to your case. We do not recommend users to update the firmware if the drive is functioning properly. You can apply the firmware update without the secure erase and see if the issues have been resolved. If they haven't we then recommend that you perform the secure erase and a fresh install.

As for image cloning, SSD Controllers have to translate logical blocks to physical nand locations. This complicates mapping snapshots when imaging 'incrementally'. If you decide to clone your drives, please use whole disk images as they include all tables in the disk signature and are preferable with your OCZ SSD. Please also perform the clones from external bootable media rather than internal to the OS.

Two questions immediately arose, which I submitted.

Thank you for your informative reply. I wonder if and how cloning could cause the kind of failure 3 of the 4 SSDs had. I mean, the drives worked fine for two or more months before suddenly becoming unrecognizable to any BIOS as drives. Can imaging actually lead to this kind of failure months later?

Secondly, you write that "Firmware updates should only be applied if the drive is malfunctioning similar to your case." In the case of this latest drive, which still shows up in BIOS, I can see how your firmware update software might still communicate with the drive (and I'm having a tech try that as I write this), but can your firmware software update an SSD that the BIOS cannot see?

I plan on also asking what exactly they mean by "external bootable media. Can it not be Windows 7 on another HDD? Their reply also raises a question that I think many individual tech-savvy SSD users might have: can programs like Tereabyte's Image for Windows (or DOS), Acronis Trueimage, or one of Paragon's programs be use to perform the kind of disk copying or imaging that OCZ recommends? For that matter, there's "imaging" and there's simply copying an entire drive. "Imaging," at least to me, means to create a .IMG file of a partition or drive using a program like Terabyte's Image for DOS. It's akin to an intermediate file that must be restored to a drive to end up with what you originally imaged. "Copying a drive," as one can do with Paragon Partition Manager, is quite different. I need to ask OCZ if simply copying an SSD to another drive, like an HDD or another SSD, is possible. And I will.

reply from imaging software support

A support guy from one of the popular imaging software vendors wrote this about OCZ's most recent reply.

"You should be able to read/write any logical block you want, itís up to their firmware to pick out where itís going to put it. Maybe they are referring to some performance issues since logical blocks may not be adjacent to physical locations, but itís not going to affect our differentials as the view seen by the app is going to be that of a normal hard drive .. blocks 0 to n.

Sounds like they are saying they want you to do full backups. Doesnít make sense to me what that has to do with reliability of their hardware. Maybe you should move to a different brand that can work no matter what blocks you write to."

Now how do I find out if other SSD manufacturers have the same unpublicized limitation (if indeed OCZ confirms that imaging or disk copying to an SSD creates an unusually failure prone drive)?

OCZ Drives Experience, week 3

Originally Posted by timmy2

OCZ replied today and it's interesting what they wrote.

Firmware updates should only be applied if the drive is malfunctioning similar to your case. We do not recommend users to update the firmware if the drive is functioning properly. You can apply the firmware update without the secure erase and see if the issues have been resolved. If they haven't we then recommend that you perform the secure erase and a fresh install.

As for image cloning, SSD Controllers have to translate logical blocks to physical nand locations. This complicates mapping snapshots when imaging 'incrementally'. If you decide to clone your drives, please use whole disk images as they include all tables in the disk signature and are preferable with your OCZ SSD. Please also perform the clones from external bootable media rather than internal to the OS.

Two questions immediately arose, which I submitted.

Thank you for your informative reply. I wonder if and how cloning could cause the kind of failure 3 of the 4 SSDs had. I mean, the drives worked fine for two or more months before suddenly becoming unrecognizable to any BIOS as drives. Can imaging actually lead to this kind of failure months later?

Secondly, you write that "Firmware updates should only be applied if the drive is malfunctioning similar to your case." In the case of this latest drive, which still shows up in BIOS, I can see how your firmware update software might still communicate with the drive (and I'm having a tech try that as I write this), but can your firmware software update an SSD that the BIOS cannot see?

I plan on also asking what exactly they mean by "external bootable media. Can it not be Windows 7 on another HDD? Their reply also raises a question that I think many individual tech-savvy SSD users might have: can programs like Tereabyte's Image for Windows (or DOS), Acronis Trueimage, or one of Paragon's programs be use to perform the kind of disk copying or imaging that OCZ recommends? For that matter, there's "imaging" and there's simply copying an entire drive. "Imaging," at least to me, means to create a .IMG file of a partition or drive using a program like Terabyte's Image for DOS. It's akin to an intermediate file that must be restored to a drive to end up with what you originally imaged. "Copying a drive," as one can do with Paragon Partition Manager, is quite different. I need to ask OCZ if simply copying an SSD to another drive, like an HDD or another SSD, is possible. And I will.

Drive #1, V3 - DOA - Powered up, not found by my LSI, not found by my on board SATA, not found by 2 other computers
Drive #2, V3 - Installed as a stand alone drive for my Windows 7 64bit installation. Worked for 3 days, then midday my controller sounds like
a heart monitor in a hospital, only flat lined, not alive and well.
Drive #3, S3 - Installed as a stand alone drive for my Windows 7 64bit installation. Worked for 1 day, then again, flat lined, Windows just froze,
forcible shutdown, reboot and the LSI bios setup sees no drive.
Drive #4, S3 - Still sitting here at my desk, afraid to open and waste more time re-installing all of my apps.

I am very less than impressed with OCZ and their very expensive high performance SSD drives. I just received the automated email that trouble ticket has been acknowledged. Thanks OCZ! How about acknowledging you've got issues! Now I have to spend another $25-$30 to RMA these drives back to OCZ where it will take 3 weeks to get new ones back.

Confliction...

RyderOCZ, "forum support manager" in the OCZ forum, replied to my thread there with:

"Cloning HDD to SSD is no different than HDD to HDD, provided the image is correct and not corrupt.

The only thing that will happen with a clone is that the SSD will take some time to re-synch its drive map because SSD's and HDD's handle data differently on the physical level. There is no "rule" that says you can't clone an HDD to an SSD."

I don't know if his position in the forum as "forum support manager" with an "Official Staff" label means he has access to the same information OCZ support has but, uh, his reply seems to conflict with OCZ's earlier assertion about using only "whole disk cloning".

Certainly one wouldn't expect to rely on a corrupted image file. I doubt most cloning software would even finish restoring a corrupted image file.

Possibly a new path to resolution

Two other OCZ forum members with "Official Staff" labels have replied. Here are the salient points:

Any response you got from online Support regarding restoring images would have been with consumer level implements in mind.

As Praz has already mentioned, advice regarding restoration of images onto drives IS AIMED AT CONSUMER LEVEL IMPLEMENTATIONS WHERE PREVIOUSLY TAKEN IMAGES COULD BE MIS-ALIGNED OR WHERE LOGICAL TO PHYSICAL MAPPING COULD CONTAIN ERRORS AT VOLUME REGISTER LEVEL.

Corporate\Client level image restoration from known good/aligned images would not necessarily fall within this area of advice.

The other respondent wrote that I need to be dealing with "HQ" and the "Enterprise Team". I'm game and have asked how I get around the normal support channel and reach these entities.

Still, I wonder:

1. if and how imaging, cloning, copying or whatever can cause an SSD to fail after a few months of quite successful use.

2. if consumers of SSDs can use widely available imaging or disk copying software to migrate from an HDD to an SSD. It's easy to tell if your SSD is misaligned after the fact, and Paragon makes an alignment tool if that's called for. Does this solve the "mis-aligned" issue cite above? Insofar as "logical to physical mapping containing errors at volume register level"... how does a consumer avoid THIS!?

I can feel your pain and frustration, but in my experience OCZ have no worse reliability than other mainstream SSD brands.

A clean install is recommended for consumer swap out because it is guaranteed to align the data written to the NAND Flash boundaries. A clone may not do that and there is some debate about off the shelf cloning programs preserving the data offsets required for the NAND Flash.

If the data does not align correctly, when one writes to a logical block of data one may be writing to two physical blocks - dramatically increasing the wear-out risk. Furthermore, older OS's such as XP do not implement Trim natively.

So, if the owner of a new shiny SSD clones XP with a non-qualified program, he/she can expect a world of pain a few months down track, and yes that can result in the drive not being recognised as a drive by the BIOS. If however the same user implements a clean install, even with XP, they can expect a longer lifetime if they implement the usual safe working practices with respect to SSD wear-out.

Better still if they implement a cloning using the native System Imaging tool in Win7, they will automatically align the new data to the logical NAND boundaries and enjoy native support for Trim and Win7 also will take care of most of the other important stuff like turning off defrag etc.

As has been noted elsewhere, Win7 support for SSD's makes a huge difference and although by no means perfect its System Image tool guarantees alignment of data.

For system admins deploying lots of cloned images, it's best to ensure the cloning tool respects the data boundaries and to push hard for an OS that will support Trim and take care of the drive.

For consumers, one has to pitch at the lower end of the knowledge pool and that means recommending a clean install as it guarantees good data alignment regardless of OS choice.

A final word: OCZ SSD drives that I have seen come with a 3-year warranty. Any business offering that must be confident of their underlying technology. Perhaps you were unlucky or had a sub-optimal implementation.

I have 3 OCZ SSD at work and so far no problems, but its only been a few months for the oldest.
I've already cloned and then used Paragon's PAT (partition alignment tool) to properly align the partition on one of them so I guess I'm a good candidate for failure?
If they do start to fail at least we'll know its not environmental because mine are all in well-cooled desktop systems, used heavily on a daily basis, no automatic defragging or indexing is going on but they are not TRIM supported--we shall see.

I started by installing Windows 7 64-bit on an SSD. I loaded the Intel AHCI driver during installation. I then made an image of the drive and saved it to an external HDD. I also copied the SSD to another HDD so I can keep it updated. Let's call this HDD "the master disk". The other system components are always the same: Intel DQ67SW, 500GB Seagate HDD for a data partition, etc.

When it's time to build another system I copy that "master disk" HDD to a new SSD. I then boot the new system with that SSD, change the key to a new one and activate it. I check alignment with Paragon's Alignment Tool and fix it if needed.

Upon delivery I add specific drivers and applications, and then I make an image of the SSD onto that PC's data HDD -- in case the SSD fails or gets corrupted and I need to restore the "C" drive.

For imaging I use Image for DOS by Terabyte. No special settings. For drive copying I use Paragon's Partition Manager Pro. It never fails to create a bootable copy of a Windows 7 boot drive.

This is not mass production, just an occasional deployment of a new PC. Do you see anything about my process that would lead to the problems you described such as mis-alignment of data (e.g., "writing to two logical blocks")?

The process you describe seems to me to be very good. The SSD drivers are loaded onto your master image and since all other hardware is the same, a new SSD in a cloned machine should run fine. However, that assumes that the new SSD requires the same drivers, that the firmware is the same and the bios on the motherboard is identical.

For small scale deployments, that are fully under your control, those assumptions are likely to be reasonable, but worthwhile double-checking.

The only thing I can't comment on then is the performance of the Image for DOS and Paragon Partition manager in relation to the data alignment. Perhaps others here know the answer to that, or maybe a lookup on the vendor websites may provide verification.

If your cloning tools do respect the data boundaries, then there maybe some other root cause of the failures, but I don't think it's related to the process you have used as that seems ok.

Since you are running Windows 7, maybe consider using the built-in Image backup tool, and restore from that image using the installation DVD: that will give you guaranteed data alignment.

It doesn't matter what you are using to restore an image with (as long as its competent software of course) IF you are checking the partition alignment with Paragon's PAT and aligning if needed.
So I also think there may be some other commonality to the cause of demise that hasn't been considered.
For the one who lost 3 for 3, there seem to be commonalities in that they were all purchased at the same time, probably from the same source and there was an extra layer of hardware translation (LSI controller) that seemed to be involved with all three. Bad luck indeed but maybe skewed by other circumstances.

Status update for thread reader

Some of my questions remain unanswered, most notably whether using a misaligned SSD can cause the failures I've experienced. Three drives "died in their sleep" so to speak, the systems booted up one morning and the SSDs were invisible to the BIOS. The fourth drive began to hang up any system to which it was attached and within a few minutes would lock it up completely. Can misalignment cause any of this?! (Moreover, I had used Paragon's Alignment Tool before delivering these systems, which makes me doubt they were misaligned in the first place.)

So far the only specific effects of misalignment that I've been told about are "decreased life" and "reduced write speed". To quote a thread at Windows 7 forum: "Without the alignment, the sector boundaries and the page boundaries will not match and sectors will span pages. That would require for a Windows write operation to clear two blocks in lieu of only one thus reducing the write speed by 50%." Here's an article that appears to be a complete explanation about the need for alignment.

Nothing I've read suggests that misalignment could cause an SSD to become uncommunicative after a couple months. I remain suspicious that the high percentage of SSD failures I and others have experienced are not attributable to misuse.

The failures aside, as a result of this exercise I have confirmed some do's and don'ts that are probably worth adhering to, to ensure maximum life and performance of an SSD. (This is not guaranteed to be 100% complete or correct, thanks to GIGO.)

1. Before using the SSD make sure it's got the latest firmware. If it's early in your particular SSD's life cycle it might be wise to update its firmware over time as well. (You should definitely image the drive before updating the firmware, in case the firmware update goes awry.) I've heard it stressed repeatedly that SSDs are new technology. The manufacturers are still learning. It's tricky making an SSD emulate a mechanical hard drive. As a result I think there's a lot of overhead and possibly unintended consequences. It wouldn't surprise me if "Windows Green 2015" and PC architecture of the future skips the "disk drive" paradigm entirely.

3. Windows 7 appears to be the only decent Windows OS suitable for SSD use because it ensures proper alignment and supports TRIM. Vista does too but why use it?

4. A clean install of Windows 7 ensures proper alignment and will disable some Windows functions that are not appropriate for SSDs (like defragmenting). There are plenty of lists around the Net about what to change in Windows 7 to accommodate an SSD, and it can get fairly technical (e.g., "Given the very high random I/O rates offered by SSDs, the speculative page fetching done by Superfetch isnít needed and, if disabled, will reduce power consumption and wear by reducing I/o and page processing activity.") so I suggest looking at SSD Tweaker or SSD Tweak Utility Pro at elpamsoft.

5. Instead of doing a clean Windows 7 installation it is okay to copy a Windows 7 HDD to an SSD, or restore an image to the SSD, assuming the Windows 7 installation is uncorrupted and will fit on the SSD. You just need to ensure that the result is aligned (various methods exist to determine alignment, and Paragon's Alignment Tool is a quick fix for it). Paragon's Partition Manager Pro 11 will ensure alignment when copying a drive or partition, and when resizing or moving partitions. Terabyte's Image for Windows/DOS/Linux requires specific settings to ensure alignment on an SSD. Regardless of what software you use, it's my impression that the end-result is all that matters: check alignment and fix it if needed.

If you do using disk copying or cloning from an image file, you must then make the aforementioned changes in Windows to accommodate the SSD (e.g., use SSD Tweaker).

Now.... we get to a new question I'm going to ask: if you install Windows 7 to a new SSD (or copy a Windows 7 HDD or image to it), what should you do if you want to replace that installation with a new one? Let's say, for example, the SSD installation gets corrupted and the most efficient solution is to restore it from an image. Before overwriting most if not all of the SSD is there some process one should go through to ensure that the SSD's MLC cells are cleared and available? (I suspect this where a "full erasure" tool comes in but haven't researched that yet.)

Oh, and one more question I'm gonna ask: when a manufacturer replaces an SSD under warranty do you get a brand new one or one that's been "refurbished"? (Given my moron-level understanding of MLC technology, refurbishing doesn't seem possible.)

The Following User Says Thank You to timmy2 For This Useful Post:

Timmy, Thanks for all your valuable advice and research on this subject. It does indeed make one contemplate whether the possible problems make the rewards worth it. Who knows. For my use, conventional HD's work just fine.

Refurbished might mean someone snapped off the power/data connector or something like that which was repaired. There are some in-house diagnostics they can run on their own drives maybe repair through firmware reapplication but as far as the storage itself I think they would be replacing the unit if it was bad.
I've heard more stories that SSDs don't fail than I have that they fail prematurely. I've got 11 OSes running simultaneously in 4 computers now and a key component in the feasibility of this undertaking is SSDs...and I'm so addicted to extracting power and performance per cycle...I can't go back!!
I can retire 2/3rds to 3/4ths of the herd if this works out.