Hybrid Flash Storage Arrays: When and Where?

Posted on March 03, 2014 By Christine Taylor

Hybrid flash arrays achieve higher performance and lower latency than HDD arrays and cost less than most all-flash arrays. However, they are more costly and add more complexity than traditional HDD arrays. Given the added cost and complexity, what types of applications and environments can most benefit from hybrid arrays?

First of all, let’s look at the problem that flash storage arrays were created to solve: the impact that huge data growth rates are having on the data center. One way to look at the issue is the “3 V’s”: volume, velocity and variety.

Data is growing 100%-plus year over year many date centers. These massive data volumes represent a big challenge to maintaining storage capacity and performance, and managing costs around purchasing, maintaining, data center real estate, and energy costs. Data is also growing at extreme velocity. With 100% growth or more a year, IT struggles to just keep up with growth. Strategic storage decisions are enormously difficult to make in this fast-moving environment.

Now add the tremendous variety of data that IT must store. There are text and digital files; audio and video; mobile data growth; and one of the biggest data producers in history: machine-generated data. For example, a single airline passenger trip generates gigabytes of data on the flight details, the passenger’s identity, security, seat assignment and frequent flyer status, the weather, the airports, and more. Multiply this single passenger’s data by tens of thousands of people flying every day and you begin to see the sheer scope of massive data generation.

Data and IO

Every time data moves between server and storage it generates Input/Output (IO) activity. Fast-growing data volumes, increasing performance velocity and a huge variety of data types all jack up the amount and speed of IO that networks and storage systems are expected to handle.

The problem is accelerated in server virtualization environments. Virtualization is a fine thing for managing server and application growth, but virtualized servers generate even more IO going to a few storage resources. Technology development routinely increases the amount of IO that servers produce, but due to mechanical constraints storage cannot keep the same pace.

There are IO bottlenecks at every stage of the server-to-spindle pipeline including controllers, fabric, and physical servers themselves. But by far the primary bottleneck is storage due to the mechanics of hard disk drives.

The application makes an IO request to the operating system, which directs the IO to the storage system. The storage system determines the optimal placement on disk for the incoming IO and moves the disk drive heads to the location of the incoming IO. Meanwhile the disk platter constantly rotates beneath the heads.

That is a lot of physical activity going on, which severely limits storage performance time. Furthermore, IO writes often break the single IO into multiple IOs in order to fit across non-contiguous locations. Latency due to slow seek speeds affects both writes and the reads.

Yet HDDs are extremely limited in how much they can improve performance speeds. Thanks to their mechanics, they top out at 100-200 IO/s (IO per second) and 3.6 millisecond (ms) seek time. So what happens to the single disk drive to make it produce up to 200 IOPs? It’s striping, clustering, and parallelizing for all it’s worth across multiple disk resources and storage controllers.

Yet even those advanced storage techniques can only go so far. The result is a serious storage bottleneck due to seek time and rotational latency. Meanwhile IO requests in the thousands, tens of thousands and even millions are hitting the storage controllers at any given moment.

Flash to the Rescue?

Flash technology is the most widely accepted solution for redressing the IO bottleneck. Flash covers a lot of territory: some environments deploy PCIe flash cards as server-side caching technology and SSDs may be placed up and down the computing stack at the server, networking or storage levels.