More on Fusion Drive: How it works, and how to roll your own

It's not Intel SRT, it's not file-based, and it works on OS X right now.

Two blog posts by Tumblr user Jollyjinx have shed some more light on the inner workings of Apple's Fusion Drive. Announced last week at Apple's event in San Jose, Fusion Drive marries a solid-state disk and a spinning hard disk drive together into a single volume, relying on the speed of the SSD to accelerate all writes and reads on the most often-used files and the size of the HDD to hold the much larger mass of less often-referenced files.

Based on Phil Schiller's remarks at the event, I speculated that Fusion Drive was a software-based, file-level automated tiering solution. A Fusion Drive-equipped Mac will come with a 128GB SSD and a much larger hard disk, from 1 to 3 terabytes. Floor reports from the event revealed that the two disks are visible as a single volume, with the total amount of space in the volume equal to the two drives' aggregated capacities. Schiller's comments indicated that Fusion Drive keeps track of what files and applications are being frequently read, physically moving (or "promoting," as it's commonly called in enterprise tiering solutions) those files and applications from the HDD to the faster SSD. At the same time, files and applications on the SSD which haven't been referenced in a while are moved back down ("demoted") to the HDD, to make room for more files to be promoted.

Many questions lingered, though, in the absence of any real technical info from Apple (and its Fusion Drive tech document provides very few hard details on the underlying functionality). Is Fusion Drive really a tiering technology, actually moving the data, or is it more of a caching solution? Does it rely on Intel's Smart Response Technology, which is available in Ivy Bridge chipsets like those in the new round of Fusion Drive-equipped Macs? Does it use the volume management features Apple introduced last year in Core Storage? Does it move whole files or just pieces of files? How does it keep track of what it's moving? Will it work on older Macs, or only newer Ivy Bridge Macs with Apple-provided SSDs?

BYO Fusion

Some of those questions are now answered. In the first of two blog posts, Jollyjinx sets out to build his own Fusion Drive using a 120GB OCZ Vertex 2 connected to his Mac's SATA bus and a USB-attached 750GB hard disk drive.

Core Storage, explained by Ars's John Siracusa in his OS X 10.7 Lion review, is used as the logical volume manager to tie the two physical devices together into a single volume group. Once the volume group is created, Jollyjinx creates a usable HFS+ volume inside of it. This is all accomplished using diskutil, the command line version of Disk Utility, since the graphical version doesn't yet support the necessary commands.

Surprisingly, no additional configuration was necessary for the volume to begin exhibiting Fusion Drive-like tendencies. Jollyjinx created 140GB of dummy files and directories on the volume using the dd command, and the system automatically placed about 120GB of those on the SSD before dropping the rest onto the HDD (easily observable by the drop in write speeds as dd's ouput was redirected from SSD to HDD). After the files were all in place, Jollyjinx then triggered a whole bunch of read activity on volume, using the dd input file flag to constrain the reads to the directories which had landed on the HDD.

By monitoring the throughput of both the HDD and SSD at the device level with iostat, it's possible to track what happens next. As soon as Jollyjinx stops the reads and the file system goes idle, the SSD lights up with write activity, sending about 14GB worth of writes from the HDD to the faster disk. After another hour of re-reading the same directories as before, they begin to show SSD read speeds instead of USB-attached HDD speeds.

Intel SRT does not handle writes this way—whether it's operating as write-back or write-through cache, SRT mirrors writes (immediately or within a short amount of time) down to the hard disk, which is not the observed behavior. Plus, as has been noted, SRT currently doesn't work with SSDs larger than 64GB. It is absolutely clear that Fusion Drive does not use SRT.

Based on these findings, Fusion Drive is indeed a base operating system feature, either contained within Core Storage or built into OS X 10.8.x (Jollyjinx notes at the bottom that he's using 10.8.2). It appears that Fusion Drive detects the SSD-ishness of a drive based on SMART info read across the SATA bus, though it's possible that Apple might be using Microsoft's SSD detection method and simply testing attached drives' throughput. If a Core Storage volume contains an HDD and an SSD, Fusion Drive appears to be automatically activated.

Block- or file-based?

Another question, though, is whether or not Fusion is "block" or file-based—that is, does it promote entire files, or merely promote the parts of files that are being referenced? The difference is important: if you have a 50GB Aperture library full of photos, for example, or a big multi-gigabyte virtual machine, will Fusion Drive promote the entire thing or just the parts of it that you're repeatedly reading?

Jollyjinx tackles this in his second post, again using dd to only read the first megabyte of several 100MB files located on the HDD side of his home-grown Fusion Drive. After giving Fusion Drive some idle time to work, telling dd to read the entirety of the 100MB files generates significant IO on both the SSD and the HDD—the first megabyte of each file is coming off the SSD, and the rest is coming off the HDD.

Clearly, Fusion Drive is operating at the "sub-file" level, which is good news. I had speculated that it was purely a file-based technology, which does have some advantages, but sub-file neatly works around the disadvantages that file-based tiering brings when working with very large files that exhibit high rates of change.

Also settled with this experiment is the question of timing. Fusion Drive behaves itself, waiting for uninterrupted idle time in order to do its tiering rather than stealing IOs away from the user while the system is active. It's not an instantaneous technology (nor should it be, since the user's reads and writes should always be prioritized over system housekeeping activities like this). There are still questions about the nature of the data movement—are the sub-file chunks promoted by being moved, or are they copied?—but the question is largely academic at this point, since even if the chunks' bits still exist on the HDD after being promoted to SSD, it's clear that their canonical location changes. This makes Fusion Drive fundamentally a tiering technology—not a cache.

We have many more questions about Fusion Drive, and we hope to get some answers soon. Our Fusion Drive-equipped Mac Mini has shipped and should be arriving within the next few days. We'll dive deep once it's here!

Promoted Comments

JollyJinx has further revealed that a DIY Fusion drive also works for a ZFS formatted volume. As JollyJinx and ArsTechnica have both established a Fusion drive works at the block level, easily evidenced by the fact that the command line dd utility generates expected behavior (dd is a block level copy command). Since it is now proven that non-HFS+ volumes work, it should in theory be possible to use with FAT32, ExFat, EXT3, or NTFS volumes, at least while running in OS X.

At this point I have not seen anything that would suggest it would work when natively booted in to Linux or Windows. In fact I would very much doubt that it would work since CoreStorage is an OS X only feature.

However Parallels running with a BootCamp volume rather than a disk image could in theory work.

Perhaps Lee could compare FAST (or other similar enterprise-y tech; I'm not trying to push EMC tech, I just happen to know more about it) to Fusion? I know that he has a lot of experience with large SAN stuff.

I was a presales engineer at EMC focusing on core storage for a couple of years, and before that I did storage architecture at a mostly-EMC shop (Boeing), so I'm pretty good on the ins & outs of FAST and FAST VP, actually

The primary differences are that big tiering solutions like FAST or Dell Compellent's Fluid Data Architecture are designed to work on systems with gobs of cache and gobs of IO ability. Plus, if I'm remembering right, FAST really works best if you've got three tiers instead of two (a fast but small SSD tier, a larger Fibre Channel or SAS middle tier for workhorse stuff, and a big SATA tier for archiving--FAST VP with just SSD & SATA is impractical, and FAST VP with just SSD & FC is too expensive).

Enterprise solutions also assume that you've got availability, support, and backup safety nets in place. You don't care about the effects of tiering on the SSD tier's write amplification, for example, because those SSDs are all under a maintenance agreement--the second one of them starts acting flaky, the vendor just shows up and replaces it. You've also got more layers of abstraction in the mix--it's not just one SSD and one HDD, but lots of HDDs and lots of SSDs, organized into pools or some other kind of logical construct, with volumes thin-provisioned out of those pools. Individual component failure isn't an issue.

The watermarks for determining how and when data are to be tiered are going to be fundamentally different, as will the actual granularity of tiering. FAST VP was doing its tiering in 1 GB chunks on VNX when I left EMC; the Symmetrix FAST VP flavor was infinitely more customizable and powerful, but still operated on...um...some number of 768 KB tracks, but I don't remember the exact chunk size.

Ultimately, Fusion Drive is a two-disk consumer solution designed to give some of the same benefits. It uses the file system and a logical volume manager to approximate a big enterprise tiering solution, but lacks many of the uptime-preserving things which make tiering an OK thing to use in the enterprise.

This doesn't quite cover everything. From the sound of it, I'm guessing that you can't use Fusion Drive on any partition that needs to be accessible from Windows. So if you want to access your documents from both Windows and OSX (which, I'd imagine, most people who are dual booting would), no fusion drive for you. Unless of course Apple are planning to release Windows drivers, which I doubt.

Speculation, which I'll confirm as soon as I have one in front of me: You create a Windows partition on the spinny disk using Boot Camp Assistant, and when you boot into Boot Camp, the EFI-based BIOS emulation which allows Windows to boot is shown that partition as the primary bootable one and goes from there. The SSD will appear as a separate disk, just like multiple OS X HDDs do in Boot Camp today.

One thing I want to confirm; the true Fusion drive is a single device, correct? Jollyjinx is running two separate physical drives, which is definitely more prone to failure (especially with an external USB device in the mix), but are we considering Fusion to be somehow less reliable when it's a single package that contains flash and spindle in one case? I'm sure the mixture has some impact on reliability, but is it expected to be a noticeable degradation in MTBF?

I don't think a true Fusion drive is a single device. This is pure speculation, but it's only available on hardware that supports two hard drive configurations. During the apple event, they showed the internals of the new iMac, and there was an area for SSD, and a separate area for HDD.

It would be nice if all the naysayers from the previous Fusion thread would let us know what seasoning they would like on their crow, but I'm not really expecting it My Mini has the upgraded video card so no room for a 2nd drive in there, but I like this solution a lot. I already run 6TB of external storage and Time Machine to there, so if my next Mac has a Fusion drive, it will be business as usual for me. Keep copying to the external, just do everything way faster

Fascinated that this works with external USB drives. I have a Mac Mini that I've swapped in a 128GB SSD, and have a 1TB external via FW800. The SSD is a bit small, and I frequently have to do some cleanup work; a Fusion setup could work really nicely!

My big concern is that occasionally (maybe once a month) the external drive doesn't seem to wake quickly enough when waking the Mac. The result is a dialog warning that the drive was not properly ejected. This is currently just a minor nuisance since it comes right back.

However, when I originally set it up, I had moved our home folders to the external drive. The first time the glitch happened, OS X reconstructed the home folders on the main drive with no content, and refused to revert back. In effect, it "lost" all of our files and settings, even though I could still see the data. A full Time Machine restore was needed, plus a bit of time to rearrange. Home folders on the SSD, separate folders for big data on the external. No data was lost, but it was scary and took several hours to figure out and fix.

I'd be concerned that Fusion would not handle that kind of glitch well...!

I'll be interested to see how it compares to ZFS with an L2ARC/ZIL. It'll be lighter on memory of course, though less failure tolerant. It's definitely interesting to see how Apple is putting some of their Core technologies to work.

To add to my previous comment... A single spinning disk is actually quite reliable, a single SSD even more. But using both in tandem along with a software "controller" leaves many points of failure.

I would want to know how the system behaves if you remove the SSD from the mix or the HD from the mix, especially during writes, etc...

In a raid setting you gain reliability (obviously talking mirroring, or other redundant raid levels). But this is no different from my understanding then when people blasted the ultrabook that had 2 SSD drives in a stripped raid set. You have more points of failure... Even more here because you have your sata controller, and a software controller, LVM, etc...

But you have a separate back drive running with Time Machine, so none of this is a real concern, right?

I really like the competition between Windows and OS X. Both are making great strides to take advantage of the latest hardware with their OS's. Even though OS X has a small market share, I am convinced many of the lower level features we are seeing in Windows 8 (File History, Storage Spaces, Hyper-V) would not exist in the consumer version if not for pressure from OS X.

It certainly doesn't look to me like the end of the desktop is anytime soon.

This doesn't quite cover everything. From the sound of it, I'm guessing that you can't use Fusion Drive on any partition that needs to be accessible from Windows. So if you want to access your documents from both Windows and OSX (which, I'd imagine, most people who are dual booting would), no fusion drive for you. Unless of course Apple are planning to release Windows drivers, which I doubt.

Speculation, which I'll confirm as soon as I have one in front of me: You create a Windows partition on the spinny disk using Boot Camp Assistant, and when you boot into Boot Camp, the EFI-based BIOS emulation which allows Windows to boot is shown that partition as the primary bootable one and goes from there. The SSD will appear as a separate disk, just like multiple OS X HDDs do in Boot Camp today.

Will you also check that you can split a factory Fusion drive into its component parts (ie separate SSD and HDD)? I assume this is a matter of just deleting the Core Storage volume and reinstalling OS X, but it would be nice to have it confirmed.

Didn't they also say in the keynote that the OS itself would *always* be on the SSD drive? So I wonder, is this happening by the mere fact that OS files are accessed frequently enough to always be there, or is there actually a way of indicating to the Fusion subsystem that "this file's blocks should always be on the SSD". I'd be curious to see someone test if they can actually read/write so much data that OS files get sent down to the HDD tier.

Will you also check that you can split a factory Fusion drive into its component parts (ie separate SSD and HDD)? I assume this is a matter of just deleting the Core Storage volume and reinstalling OS X, but it would be nice to have it confirmed.

I can't think of any reason why that wouldn't work, but sure, I can give it a shot.

Didn't they also say in the keynote that the OS itself would *always* be on the SSD drive? So I wonder, is this happening by the mere fact that OS files are accessed frequently enough to always be there, or is there actually a way of indicating to the Fusion subsystem that "this file's blocks should always be on the SSD". I'd be curious to see someone test if they can actually read/write so much data that OS files get sent down to the HDD tier.

There's almost certainly some method of "pinning" that the file system can use to ensure thing stay on SSD or HDD, because that is pretty much what Phil said in his keynote.

I wonder if all of the housekeeping activity has an appreciable effect on battery life.

It will actually extend battery life on average from all I can see thus far, by reducing the workload on the much more "expensive" to access mechanical harddisk which needs to accelerate its mechanical heads to access different chunks of data. I doubt that Fusion will actually allow the mechanical harddisk to be parked completely very often, but even that might happen now at long last – thus far the harddisk spindle motors were running permanently as long as the machine was awake.

The copying of blocks between the two component drives will use a bit of power, but using the SSD instead of the HD after that should save a lot more on average.

Another question, though, is whether or not Fusion is "block" or file-based—that is, does it promote entire files, or merely promote the parts of files that are being referenced? The difference is important: if you have a 50GB Aperture library full of photos, for example, or a big multi-gigabyte virtual machine, will Fusion Drive promote the entire thing or just the parts of it that you're repeatedly reading?

Isn't an Aperture libary (as well as most applications and other libraries) a folder containing many individual files, at least from a file system perspective? Only for purposes of the highest level of user GUI, the set of files is displayed as a single file.

Please actually read the article and the previous one on this topic. This moves frequently used files from the hard drive to the SSD when they are more frequently used so you get faster acess to your data for the things you use most often. Less used bits will float back to the HDD. Speed for the Fusion drive is only a tick below a straight SSD as well

– A Cache is usually smaller than the SSD portion of the Fusion Drives, so its impact is smaller as well. From the looks of it, you could configure a Fusion Drive with a 768GB SSD plus a 3TB HD for an aggregate capacity of 3.75TB and almost every access coming from the SSD.

– Maintaining the cache is done inline while you're accessing the actual data, so under load the cache can actually be slower than the Fusion Drive accessing the SSD or HD directly without having to fill up a cache simultaneously. The Fusion Drive apparently does its optimization at idle time, so it may steal less I/O throughput and less CPU cycles while actually being accessed (it only needs to run bookkeeping to keep track of optimization potential).

+ If only a cache fails, the actual drive may(!) still be intact as long as the damaged cache hasn't dragged the volume structure down with it already (which it might do, however). A Fusion Drive is gone if either component should fail, so you should really use Time Machine. But the SSD part of the Fusion Drive also reduces the wear on the HD part, so it is actually conceivable that a Fusion Drive might(!) actually live longer than just an HD alone would have.

Isn't an Aperture libary (as well as most applications and other libraries) a folder containing many individual files, at least from a file system perspective? Only for purposes of the highest level of user GUI, the set of files is displayed as a single file.

It is a directory, but it's also a package (try "mdls 'aperture_library_name'" and look for "com.apple.package" in the output), so it's treated differently by the file system.

Phil did say that "applications" moved back and forth with Fusion Drive, and applications are also essentially just directories (they are typically both packages and also bundles). Fortunately, it's moot, since we know thanks to Jollyjinx that Fusion Drive works at least with 1MB sub-file chunks, and possibly smaller.

I'm not a storage guru, but this sounds somewhat like EMC's Fully Automated Storage Tiering (FAST). Here's a whitepaper that explains FAST, for those who think that Fusion is nothing but a hybrid drive:

FAST is used in large SANs, as others have pointed out. Think major $$$. If Apple is bringing something similar to consumer desktops, I think it's a pretty big deal.

Perhaps Lee could compare FAST (or other similar enterprise-y tech; I'm not trying to push EMC tech, I just happen to know more about it) to Fusion? I know that he has a lot of experience with large SAN stuff.

Disclaimer: I work for EMC, but this is not me trying to push my employer onto you. I'm sure other large SAN vendors have similar tech, I just don't know about them.

I have two FrankenMacs where I have replaced the optibay with an SSD. I will definitely implement this on both machines. I still believe I will have to put the OS a dedicated partion on the SSD though, as I don't believe OSX can boot offsuch a setup natively.

I have two FrankenMacs where I have replaced the optibay with an SSD. I will definitely implement this on both machines. I still believe I will have to put the OS a dedicated partion on the SSD though, as I don't believe OSX can boot offsuch a setup natively.

Seperate SSD and HDD? That is exactly what this does so yeah it should boot OS X natively

Perhaps Lee could compare FAST (or other similar enterprise-y tech; I'm not trying to push EMC tech, I just happen to know more about it) to Fusion? I know that he has a lot of experience with large SAN stuff.

I was a presales engineer at EMC focusing on core storage for a couple of years, and before that I did storage architecture at a mostly-EMC shop (Boeing), so I'm pretty good on the ins & outs of FAST and FAST VP, actually

The primary differences are that big tiering solutions like FAST or Dell Compellent's Fluid Data Architecture are designed to work on systems with gobs of cache and gobs of IO ability. Plus, if I'm remembering right, FAST really works best if you've got three tiers instead of two (a fast but small SSD tier, a larger Fibre Channel or SAS middle tier for workhorse stuff, and a big SATA tier for archiving--FAST VP with just SSD & SATA is impractical, and FAST VP with just SSD & FC is too expensive).

Enterprise solutions also assume that you've got availability, support, and backup safety nets in place. You don't care about the effects of tiering on the SSD tier's write amplification, for example, because those SSDs are all under a maintenance agreement--the second one of them starts acting flaky, the vendor just shows up and replaces it. You've also got more layers of abstraction in the mix--it's not just one SSD and one HDD, but lots of HDDs and lots of SSDs, organized into pools or some other kind of logical construct, with volumes thin-provisioned out of those pools. Individual component failure isn't an issue.

The watermarks for determining how and when data are to be tiered are going to be fundamentally different, as will the actual granularity of tiering. FAST VP was doing its tiering in 1 GB chunks on VNX when I left EMC; the Symmetrix FAST VP flavor was infinitely more customizable and powerful, but still operated on...um...some number of 768 KB tracks, but I don't remember the exact chunk size.

Ultimately, Fusion Drive is a two-disk consumer solution designed to give some of the same benefits. It uses the file system and a logical volume manager to approximate a big enterprise tiering solution, but lacks many of the uptime-preserving things which make tiering an OK thing to use in the enterprise.

This is totally awesome. The hard disk in my late 2009 iMac failed six months ago (Yes, one of these). Since it had to be opened up anyway, I replaced the optical drive with an SSD. With Fusion Drive, it means I'll be able to get even more performance, convenience and life out of my iMac.

Ultimately, Fusion Drive is a two-disk consumer solution designed to give some of the same benefits. It uses the file system and a logical volume manager to approximate a big enterprise tiering solution, but lacks many of the uptime-preserving things which make tiering an OK thing to use in the enterprise.

Yeah, that's what I'm wondering as well: How is Apple dealing with the lack of the uptime-preserving stuff?

One thing I want to confirm; the true Fusion drive is a single device, correct? Jollyjinx is running two separate physical drives, which is definitely more prone to failure (especially with an external USB device in the mix), but are we considering Fusion to be somehow less reliable when it's a single package that contains flash and spindle in one case? I'm sure the mixture has some impact on reliability, but is it expected to be a noticeable degradation in MTBF?

I don't think a true Fusion drive is a single device. This is pure speculation, but it's only available on hardware that supports two hard drive configurations. During the apple event, they showed the internals of the new iMac, and there was an area for SSD, and a separate area for HDD.

EDIT Updated with screencap from event:

Thanks, I completely missed that in the launch presentation, so I'd been assuming this was a single drive/single interface solution.

Does any know if there is a way to exclude some files or directories from ever being on the SSD?

My 27" iMac doubles as a TV and it constantly thrashes the HD for caching or saving raw ATSC streams. This can be roughly 8GB/hour. It can easily consume hundreds of gigs if left unattended.

You have an application-level problem then. My EyeTV is set to a live buffer size of 3GB which it never exceeds unless I've actually programmed a permanent recording.

I doubt that my setup would actually clog up the SSD – unless I actually kept watching the same recording all the time, it should end up on the HD in any case, either by default right away or after being pushed out by more frequently used data.

Does any know if there is a way to exclude some files or directories from ever being on the SSD?

My 27" iMac doubles as a TV and it constantly thrashes the HD for caching or saving raw ATSC streams. This can be roughly 8GB/hour. It can easily consume hundreds of gigs if left unattended.

Your best bet would be to make a second partition on the HDD, just for the DVR. It's not the ideal solution (obviously), but that'd be a guaranteed way to pull that off. We might know a better way down the road, once people have torn Fusion Drive completely apart (such as setting a flag on a directory that says 'keep all my stuff on the crappy drive!').

Sort of peripheral to the main point, but an Aperture library is actually a bunch of separate files. The OSX package capability is used to make it look like a monolithic object, but if you drill down in the Terminal or by doing a 'show package contents', you can see the entire directory tree and all of the (many) individual files contained therein.

Might move this question to the Linux forum, but this seems like something Linux should be able to do/has been doing already? I'm not up on my ZFS/LVM knowledge, so can someone fill me in if one could set up such a solution for a Linux box?

"... if you have a 50GB Aperture library full of photos, for example, or a big multi-gigabyte virtual machine, will Fusion Drive promote the entire thing or just the parts of it that you're repeatedly reading?"

Aperture is a package. Block level promotion might be useful for the largest files in the package (the RAW images).

Lee Hutchinson / Lee is the Senior Reviews Editor at Ars and is responsible for the product news and reviews section. He also knows stuff about enterprise storage, security, and manned space flight. Lee is based in Houston, TX.