Interview: Jeff Bonwick on the Secret Sauce behind DSSD

The 32nd International Conference on Massive Storage Systems and Technology (MSST 2016) takes place next week in Santa Clara. As the premier conference for massive-scale storage systems, the MSST conference will feature a keynote by Jeff Bonwick from DSSD. We caught up Jeff Bonwick to learn more about this exciting technology for HPC.

insideHPC: There are a lot of flash-based storage solutions. What would you say the secret sauce is for the DSSD D5?

Jeff Bonwick, Founder, DSSD

Jeff Bonwick: Everything about the D5 was built from first principles. We started by looking at the data sheet for a single NAND flash chip and calculating what it was capable of in terms of IOPS and bandwidth. We then multiplied that by the number of flash chips that would be in a typical array of SSDs, and the numbers were staggering — literally 100 times faster than any product on the market. At a time when 100K IOPS and 1 GB/s were respectable numbers, we saw that the raw media was capable of 10M IOPS and 100 GB/s. That seemed like a huge opportunity. The challenge was finding some way to actually deliver that raw performance to applications. Doing so required a new approach to just about everything, both hardware and software.

The D5’s 36 flash modules each contain 512 NAND die, but a flash module is not an SSD; our Flood software actually manages all 18,432 NAND die as independent, concurrent I/O devices. In addition to providing massive IOPS and bandwidth, having that many discrete devices enables new, more powerful data protection algorithms like Cubic RAID, which can recover from many more errors than linear RAID. Every flash module is directly connected to every client port over a full-mesh PCIe fabric, the largest ever built, so that data can be DMA’ed directly from a flash module to client memory with no CPU involvement. The CPUs are just traffic cops, keeping track of which flash chips contain which pieces of user data. The biggest challenge was making the software fast enough to drive the hardware. We use a total of 32 CPU cores to drive 10M IOPS, which implies a budget of just 3 microseconds per IOP. A few thousand instructions, a dozen cache misses… time’s up!

From an end-user perspective, of course, none of this is visible. What’s exciting for our customers is getting the performance of server-attached NVMe flash with the sharing, scale, density, and reliability of all-flash arrays.

insideHPC: What were the design goals of the DSSD D5 in terms of storage performance and density?

Jeff Bonwick: Our mission statement was four words: fastest storage on earth. That was our singular goal from day one, which gave the team incredible focus and clarity. Whenever we had to make a tradeoff between performance and something else, performance always won. Always. And it just so happens that when you aim for performance, density comes along for the ride because the more flash chips you have working in parallel, the faster it goes.

insideHPC:We first heard of DSSD here in the HPC community when the folks at TACC adopted it for their Wrangler supercomputer. What did you come away with from that deployment? Will you be developing purpose-build HPC products coming from EMC?

Jeff Bonwick: TACC has been a great partner. We were truly honored when Wrangler won HPCwire’s Readers’ Choice Award for Best Data Intensive System, because it affirmed what we were hearing from individual researchers who were using the Wrangler system. Some problems in computational biology, for example, have not just gotten faster — they’ve gone from unsolvable to solvable. We’ve often faced skepticism about whether anyone really needs this level of performance. Well, consider that the Wrangler deployment is not just one D5, it’s ten — which add up to 100M IOPS and 1 TB/s. And there’s still hunger for more. I don’t think there can ever be “enough” performance, because that would mean that humans have stopped trying to solve harder problems.

insideHPC: DSSD was acquired by EMC before you brought a product to market. How did being part of a much larger enterprise help you deliver your product to market?

Jeff Bonwick: Becoming part of EMC allowed us to focus on finishing the product and doing it right. We no longer had the myriad distractions of trying to get a business off the ground. We weren’t under pressure to ship something, anything, before funding ran out. We could partner with EMC’s world-class QA team. And if buying more equipment could help us finish faster, we could trade money for time.

insideHPC: Can you give us a preview of what you’ll be discussing at the MSST Storage Conference?

Jeff Bonwick: I’m going to describe both the challenges and the opportunities that flash and its successors present, how we addressed those in D5, and where I see the industry going.

Resource Links:

Latest Video

Industry Perspectives

AI is a game changer for industries today but achieving AI success contains two critical factors to consider — time to value and time to insights. Time to value is the metric that looks at the time it takes to realize the value of a product, solution or offering. Time to insight is a key measure for how long it takes to gain value from use of the product, solution or offering. [READ MORE…]

White Papers

With the exponential growth of data that needs to be analyzed and the data resulting from ever-more complex workflows, the need for faster data movement has never been more challenging and critical to the worlds of High Performance Computing (HPC) and machine learning. Mellanox Technologies is once again moving the bar forward with the introduction of and end-to-end HDR 200G InfiniBand product portfolio. Download the new white paper, courtesy of Mellanox, that explores in-network computing and the benefits of the switch from 100G to 200G Infiniband.