Oracle Blog

inside the sausage factory

Wednesday Aug 18, 2010

I joined the Solaris Kernel Group in 2001 at what turned out to be a remarkable
place and time for the industry. More by luck and intuition than by premonition,
I found myself surrounded by superlative engineers working on revolutionary
technologies that were the products of their own experience and imagination
rather than managerial fiat. I feel very lucky to have worked with Bryan and
Mike on DTrace; it was amazing that just down the hall our colleagues reinvented
the operating system with Zones, ZFS, FMA, SMF and other innovations.

With Solaris 10 behind us, lauded by customers and pundits, I was looking for
that next remarkable place and time, and found it with Fishworks. The core dozen
or so are some of the finest engineers I could hope to work with, but there were
so many who contributed to the success of the 7000 series. From the executives
who enabled our fledgling skunkworks in its nascent days, to our Solaris
colleagues building fundamental technologies like ZFS, COMSTAR, SMF, networking,
I/O, and IPS, and the OpenStorage team who toiled to bring a product to market,
educating us without deflating our optimism in the process.

I would not trade the last 9 years for anything. There are many engineers who
never experience a single such confluence of talent, organizational will, and
success; I'm grateful to my colleagues and to Sun for those two opportunities.
Now I'm off to look for my next remarkable place and time beyond the walls of
Oracle. My last day will be August 20th, 2010.

Thank you to the many readers of this blog. After six years and 130 posts I'd never think of giving it up. You'll be able to find my new blog at dtrace.org/blogs/ahl (comments to this post are open there); I can't wait to begin chronicling my next endeavors. You can reach me by email here: my initials at alumni dot brown dot edu. I look forward to your continued to comments and emails. Thanks again!

Tuesday Aug 17, 2010

This year's flash memory summit got me thinking about our use of SSDs over the years at Fishworks. The picture of our left is a visual history of SSD evals in rough chronological order from the oldest at the bottom to the newest at the top (including some that have yet to see the light of day).

Early Days

When we started Fishworks, we were inspired by the possibilities presented by ZFS and Thumper. Those components would be key building blocks in the enterprise storage solution that became the 7000 series. An immediate deficiency we needed to address was how to deliver competitive performance using 7,200 RPM disks. Folks like NetApp and EMC use PCI-attached NV-DRAM as a write accelerator. We evaluated something similar, but found the solution lacking because it had limited scalability (the biggest NV-DRAM cards at the time were 4GB), consumed our limited PCIe slots, and required a high-speed connection between nodes in a cluster (e.g. IB, further eating into our PCIe slot budget).

The idea we had was to use flash. None of us had any experience with flash beyond cell phones and USB sticks, but we had the vague notion that flash was fast and getting cheaper. By luck, flash SSDs were just about to be where we needed them. In late 2006 I started evaluating SSDs on behalf of the group, looking for what we would eventually call Logzilla. At that time, SSDs were getting affordable, but were designed primarily for environments such as military use where ruggedness was critical. The performance of those early SSDs was typically awful.

Logzilla

STEC — still Simpletech in those days — realized that their early samples didn't really suit our needs, but they had a new device (partly due to the acquisition of Gnutech) that would be a good match. That first sample was fibre-channel and took some finagling to get working (memorably it required metric screw of an odd depth), but the Zeus IOPS, an 18GB 3.5" SATA SSD using SLC NAND, eventually became our Logzilla (we've recently updated it with a SAS version for our updated SAS-2 JBODs). Logzilla addressed write performance economically, and scalably in a way that also simplified clustering; the next challenge was read performance.

Readzilla

Intent on using commodity 7,200 RPM drives, we realized that our random read latency would be about twice that of 15K RPM drives (duh). Fortunately, most users don't access all of their data randomly (regardless of how certain benchmarks are designed). We already had much more DRAM cache than other storage products in our market segment, but we thought that we could extend that cache further by using SSDs. In fact, the invention of the L2ARC followed a slightly different thought process: seeing the empty drive bays in the front of our system (just two were used as our boot disks) and the piles of SSDs laying around, I stuck the SSDs in the empty bays and figured out how we'd use them.

It was again STEC who stepped up to provide our Readzilla, a 100GB 2.5" SATA SSD using SLC flash.

Next Generation

Logzilla and Readzilla are important features of the Hybrid Storage Pool. For the next generation expect the 7000 series to move away from SLC NAND flash. It was great for the first generation, but other technologies provide better $/IOPS for Logzilla and better $/GB for Readzilla (while maintaining low latency). For Logzilla we think that NV-DRAM is a better solution (I reviewed one such solution here), and for Readzilla MLC flash has sufficient performance at much lower cost and ZFS will be able to ensure the longevity.