NVMe for Beginners

Last year I did a presentation on NVMe for Beginners along with Craig Waters for vBrownBag at the Melbourne VMUG UserCon. It was a daunting experience as it was a new cohort to present in front of and NVMe is a topic I had no expertise in. It’s something I wanted to learn more about and I thought that doing a presentation on it would mean I’d have to pull my finger out and really get down to understanding it. Nothing like a bit of pressure to learn something :-). Thankfully with Craig I had someone that had been through the mill a few times when it came to presenting. His mentorship and guidance made the presentation so much easier, and it gave me the confidence to do the presentation on my own a few months later at a normal VMUG.

Unfortunately the presentation contained proprietary information so it cannot be circulated online but I’ll run through the premise of the presentation and hopefully provide a brief introduction to NVMe. If you want to get the best understanding possible about NVMe I cannot recommend enough that you take time to read J Metz’s article on Cisco blogs NVMe for Absolute Beginners. It’s a phenomenal breakdown on NVMe and it’s so well written that even I was able to comprehend it.

I’ll run through the presentation as much as possible here. So, why NVMe?

Another reason to consider NVMe is that there’s a bottleneck brewing. As the density of NVMe’s increase the IOPS remain the same so the number of IOPS per GB is reduces. For example, a Seagate 200GB Dual 12Gb/s SAS with 200,000 IOPS provides 1000 IOPS per GB. A Seagate 15TB Dual 12Gb/s SAS with 200,000 IOPS provides 12.7 IOPS per GB. NVMe as a protocol was designed from the ground up with the requirements for Flash storage in mind which unlocks the bottleneck of dual serial paths to each SSD. Historically, the storage array for Serial attached SCSI (SAS) provides a dual path to each SSD. NVMe running over PCIe has been designed to provide:

Low Latency

Complex software introduces latency

Reduced command set

Simplified software stack reducing Latency

High Concurrency

Multi-core CPU with dedicated queue per SSD

A memory based architecture allows parallel access to multiple devices

During the presentation I leveraged the concepts from Dr. J’s article and used the analogy of an airport with three runways. There’s one air traffic control that can only focus on one runway at a time, essentially wearing blinkers. This represents the queue only being able to process one transaction at a time. The runway length represents different areas on the disk platter. While a plane is landing and taxiing along the runway all other planes waiting to land will be held in a holding pattern.

Next up was looking at Non-Volatile Memory still using SCSI. The commands per queue are still the same. In this case the runways have been replaced with a gigantic car park. There’s still only one controller acting as the queue but because there’s no runways it’s much faster to process each plane landing. It does take a lap of faith to think vertical landing is possible for all plane, ala Harrier Jump Jets.

So what about Non-Volatile Memory with NVMe? Well now we have 64,000 queues with 64,000 commands per queue. Following on the from example it basically provides 64,000 traffic controllers to handle the number of planes that need to land so basically everything just drops out of the sky is one gigantic batch.

During the presentation we also touched on NVMe over Fabrics. This was a focus for Craig and I honestly can’t do it justice here so I’d recommend returning to the guru, Dr. J Metz who has put together an absolutely awesome Program of Study on NVMe which also covers the NVMe over Fabrics concepts. I recommend that if you want to learn more about NVMe work through his study plan. Each part builds upon the knowledge imparted in the previous one.

So just to briefly wrap up what are the advantages of NVMe?Better performance
* Higher throughput
* Lower Latency