NVMe Flash: No Magic Bullets Here

The problem with writing about storage technology is, well, that it tends to focus on storage technology – as though that was an end in itself. Storage vendors might prefer that we look at their wares through such a narrow lens, comparing one product to another based on simplistic engineering metrics. But purchasing storage based on comparisons of speeds and feeds it isn't any more correct than looking at storage through the lens of a hypervisor and selecting a proprietary silo for data that fits the hypervisor vendor's dreams of top-to-bottom domination of the hardware/software stack.

How To Make a Smart Choice
Put simply, a smart choice of storage technologies and topologies can only be made from the perspective of workload and data. More specifically, the right storage platform for a given set of data can only be discovered by examining, at a minimum, the access and modification characteristics of the data.

We all should understand this point intuitively, especially those of us who cut our teeth on structured systems development methodologies in which the application inputs and outputs are all mapped out before we design a hosting environment. Yet, the storage industry keeps trying to dumb down this all-important consideration: what the application and its data actually need. Instead, they keep trying to substitute what they want to sell instead.

For example, it makes no sense whatsoever to deploy an All Flash Array (AFA) to host data that is hardly ever accessed or to store data that changes rather frequently. Flash doesn't provide the right cost metrics for storing archival data or the appropriate operational characteristics for use with files or objects that require frequent rewrites.

A Star Is Born
Yet, the latest darling of the storage industry -- NVMe Flash -- is being treated like the Solution To All Your Storage Needs.

Slow virtual machines? Deploy NVMe. Need big scaling repositories for file and object? Deploy NVMe. Want to avoid data "friction" and "shelter data in place?" Deploy NVMe, and just power down the flash drive when the data "goes to sleep."

Truth be told, NVMe -- or Non Volatile Memory Express, which is a specification for connecting flash memory directly to a PCIe bus, thereby circumventing the latencies generated by connecting flash to a SATA HDD controller -- does not address any of the use cases listed. Slow VMs, for example, are usually not the result of latency in storage I/O, but are instead a reflection of the sequential I/O processing at the "head end" of the I/O bus, the CPU.

The absence of any significant queue depth (queuing up of I/Os waiting to be served by a storage device) means that storage is not creating a chokepoint or source of latency that could cause slow VM performance. Slow VMs in the presence of shallow queue depth suggest to smart folks that VM latency is being caused by something else. So adding a lot of expensive, fast storage won't make an iota of difference.

Data Friction Fiction
As for the "data friction" woo that is kicking around in a lot of Silicon Valley start-ups these days, NVMe again is not some sort of miracle cure. "Friction fiction" holds that any "unnecessary data movement" – including tiering of less frequently accessed data to higher capacity, lower cost storage and data replication to secondary storage for protection against loss or corruption – creates latency in the application environment that cannot be tolerated.

Advocates of this point of view want a "flat" or single tier storage infrastructure comprised entirely of NVMe flash. Data gets written to the NVMe flash solid state disks and, when it is no longer accessed or updated with any frequency, the SSD is powered down. The data stored on the powered down drive is supposed to be resilient (remember, NVMe means non-volatile, so cutting power should not cause any data loss), meaning that the data it hosts is "sheltered in place."

Anyone who has had experience with flash (or any other storage medium besides tape or optical) knows that powering down the storage device after protracted use is too often accompanied by a device failure if and when it is powered up again. Opinions may vary, of course, but it cannot be denied that the non-volatility of NVRAM has not been empirically proven. It hasn't been around long enough to test 30-year resiliency claims.

And again, the idea of "friction" from data movement has been largely disproven after several years of being cast as the hobgoblin of, again, poor VM performance. VMware first tried to "offload" I/O-intensive tasks like data migration and data replication to array controllers (remember VMware's vStorage APIs for Array Integration, or VIAA) as a solution to the theory that "data friction" from using the server CPU to manage data movements was introducing latency into the system. At the end of the day, VIAA did not resolve the issue because it had nothing whatsoever to do with data friction.

NVMe in a Nutshell
NVMe is the shiny new thing in storage, with another specification for NVMe over Fabric having been released this past Summer. It has the 800-pound gorilla, Intel, behind it and has successfully sucked all of the oxygen out of discussions of other proprietary stacks and drivers for connecting flash to PCIe. That means we have substituted one proprietary method for connecting flash to systems for a bunch of proprietary methods articulated by a lot of start-ups in the space a decade ago. De facto or de jure, NVMe is a standard that most system developers are treating as an ironclad design metric.

The big issue is that NVMe has happened before it was really needed, rather like 100 GbE. It may be well suited to applications like very large read caches, since flash really isn't optimized for writes. It may also provide the means to create huge Big Data analytics databases in-memory. Analytics DBs tend to be read intensive, making their data ideally suited to flash memory storage. For online transaction processing systems, which tend to be more write intensive, DRAM storage may actually provide a better fit.

In any case, NVMe flash storage will eventually find a smart use case, but its arrival really doesn't change the fact that hard disk is optimized for randomized reads and writes, that tape has the capacity, resiliency and linear write speed that make it ideal for long term archival storage. Smart storage architecture requires an "All of the Above" strategy. Anyone who seeks an all-silicon datacenter is just being silly.

About the Author

Jon Toigo is a 30-year veteran of IT, and the Managing Partner of Toigo Partners International, an IT industry watchdog and consumer advocacy. He is also the chairman of the Data Management Institute, which focuses on the development of data management as a professional discipline. Toigo has written 15 books on business and IT and published more than 3,000 articles in the technology trade press. He is currently working on several book projects, including The Infrastruggle (for which this blog is named) which he is developing as a blook.