SSD vs. HDD: Performance and Reliability

Thursday May 3rd 2018 by Christine Taylor

Should you buy that new storage system with SSD, HDD, or both? The answer depends on understanding the balance of cost, performance, capacity, and reliability to create a combination of HDD and SDD for your workloads and budget.

Should you buy that new storage system with SSD, HDD, or both? The answer depends on understanding the balance of cost, performance, capacity, and reliability between these two storage technologies. And the ultimate goal is most cases is to create a combination of HDD and SDD for your workloads and budget.

SSDs are higher performing, but come at a premium cost. Not every workload needs that level of investment. Capacity comes into play as well, with SSDs capable of higher capacity than HDDs. But once again, higher cost makes buying SSD for capacity a very expensive proposition and a poor one for long-term storage. Finally, the differences in reliability are a little murkier but in general there are no glaring differences between the two types of media.

Performance Differences in SSD and HDD

The performance difference between SSD and HDD is very clear: SSD performance is its primary differentiator because HDD can only accelerate so far.

HDD is Slower, Because Physics

In a hard disk array, the controllers direct read/write requests to physical disk locations. The platter spins and disk drive heads move to the designated location. Non-contiguous writes come into play, which adds latency. All this physical movement adds up to latency that SSD do not suffer from.

· Access time is the period it takes for the disk drive to move the heads to a read/write track.

· Rotational latency is the time for the requested sector to move under the head.

· Transfer rate measures how fast data transmits to and from the read/write heads.

A hard disk drive, unlike a solid state drive, is filled with moving parts.

SSD Performance Wins, for High IOPs

SSD has no moving parts and thus no physical seek limits, so the SSD can access memory addresses much faster than the HDD can move drive heads. This dramatically higher performance places SSD in the ideal position for high IOPs. Tier 0 storage frequently uses only SSD in this tier, although Tier 1 may combine high performance SSD with 15RPM HDD.

· Higher SSD cost. Although SSD costs are consistently falling year over year, so are HDD prices. The two have stayed roughly parallel over the last several years. Today, HDD averages 3 cents per GB while NAND SSD averages 25-30 cents per GB. (You might see a vendor’s blue-sky announcement that their NAND SSD will cost 3 cents per GB, but… vendor’s blue-sky announcement.) The higher price keeps SSD in the high-performance sector where buyers can justify the higher cost.

· Huge installed HDD base. Another reason is HDD’s massive installed base. It makes zero business sense to replace entire servers and storage systems at an astronomical cost, simply to substitute faster SDD for HDD. Most workloads do not need that level of performance at all.

Some high-performance data centers are moving closer to all-flash by migrating nearline and secondary data to edge or cloud storage. However, even Gartner – who is bullish on flash – states that less than 10% of data centers have an all-flash array. (SSD is common but in hybrid arrays.)

SDD and HDD Reliability Issues

Outside of environmental considerations, SSD and HDD reliability is a murkier issue than cost/performance. Are SSDs more reliable than HDDs, or vice versa? It depends.

SSDs are extremely reliable in harsh environmental conditions because they have no moving parts to break. SSD will laugh off extreme cold and heat, being dropped, and experiencing multiple G’s. HDD, not so much.

However, data centers typically do not experience arctic temperatures or liftoff. And SSDs do have some hardware that can fail such as transistors and capacitors. Wayward electrons can also do some damage and failing firmware can take an SSD down with it. Wear and tear are also issues, and even SSD memory cells eventually wear out.

So, if you are flying to the moon than SSDs make sense. Otherwise, environmental conditions in a data center will not have a big effect on reliability.

The lowly flash drive, unlike the HDD, has not a single moving part in its architecture.

Reliability Measures: HDD

Even when HDDs are running in an environmentally safe location (and not being dropped on their heads), internal threats include equipment failures, data errors, and head crashes.

· Equipment failure is usually due to wear and tear or manufacturing defects. Manufacturers measure HDD reliability by running clusters of disk models and families, and using the resulting reliability numbers to produce mean time between failures (MTBF) or annualized failure rates (AFR).

· Head crashes are the common cause of a failed HDD. It happens when read/write heads scrape or touch the surface of the platter.

· Data errors occur thanks to several causes. Firmware and the OS can identify some errors; others go undetected until the hard drive fails. Error correcting code (ECC) helps to protect against data errors by writing data into protected sectors. ECC reserves space on each sector to store information about the the sector’s user data. When the heads read the user data back, ECC uses the read and ECC bits to report any errors to the controller. It can remediate some errors and sends alerts about the ones it cannot correct.

As for reliability results, HDD makers generally use the AFR as their error index, which averages between .55% and .90% failure rates. However, HDD manufacturers do not report how many under-warranty disks they replace each year. Failures may not be due to the disk – they could be from an overheated data center, or a dropped system, or a natural disaster. However, some of them are due to failed disk drives. These rates range much higher than AFR, from 0.5% to as high as 13.5%.

Reliability Measures: SDD

SSDs are better in harsh conditions but can still fail from a variety of causes. Common failure points include bit errors, where random data bits are stored to cells (also referred to as leaking electrons). Flying or shorn writes are writes written to the wrong location, or truncated (shorn) thanks to power loss. Unserializability means that writes are recorded in the wrong order.

Firmware is also to blame for some SSD reliability failures, as it is subject to failure, corruption, and improper upgrades. And electronic components like chips and transistors can fail. Finally, although NAND flash is non-volatile, a power failure can corrupt a read/write action.

Improving Reliability

Reliability is less of an issue today than it was, thanks to reliability technologies like wear leveling and data integrity checks.

· Wear leveling monitors component wear and data movement across cells. It also writes and erases across multiple cells to extend the life of the media by mapping logical block addresses (LBA) to physical memory addresses. It then rewrites data to a new block each time (dynamic) or reassigns low usage segments to active writes (static) to avoid consistent wear to the same memory segment.

· Data integrity checks include error correction code (ECC), Cyclic Redundancy Check (CRC), and address translation. ECC checks data reads and corrects some hardware-based errors, and CRC verifies that written data is returned intact to a read request. Address translation fights location-based errors by verifying that a read is occurring from the correct logical address, and versioning retrieves the current data versions.

· Garbage collection firmware reclaims sparsely used blocks. Since NAND SSD only writes to empty blocks, the firmware analyzes cells for partially filled blocks, merges data into new blocks, and erases old ones to accept new writes.

Not SSD or HDD, but Both

The conclusion is you should choose media to suit your workloads based on performance and cost. Reliability is a critical requirement for media of course, but outside of harsh environments SSDs and HDDs have similar reliability rates.

Match your media choices to your workload needs. You might invest in an all-flash array for high-performance IOPs, but most data center managers buy hybrid arrays with Tier 0 SSD and combined Tier 1 SSD/HDD. SSD costs are too high to justify nearline and secondary storage media. And although SSD is capable of higher capacity, data centers primarily need capacity for aging data. Higher capacity would come at a very high cost and has little business justification.

The enterprise data storage marketplace has already drawn the same conclusion.