More on Fusion Drive: How it works, and how to roll your own

It's not Intel SRT, it's not file-based, and it works on OS X right now.

Two blog posts by Tumblr user Jollyjinx have shed some more light on the inner workings of Apple's Fusion Drive. Announced last week at Apple's event in San Jose, Fusion Drive marries a solid-state disk and a spinning hard disk drive together into a single volume, relying on the speed of the SSD to accelerate all writes and reads on the most often-used files and the size of the HDD to hold the much larger mass of less often-referenced files.

Based on Phil Schiller's remarks at the event, I speculated that Fusion Drive was a software-based, file-level automated tiering solution. A Fusion Drive-equipped Mac will come with a 128GB SSD and a much larger hard disk, from 1 to 3 terabytes. Floor reports from the event revealed that the two disks are visible as a single volume, with the total amount of space in the volume equal to the two drives' aggregated capacities. Schiller's comments indicated that Fusion Drive keeps track of what files and applications are being frequently read, physically moving (or "promoting," as it's commonly called in enterprise tiering solutions) those files and applications from the HDD to the faster SSD. At the same time, files and applications on the SSD which haven't been referenced in a while are moved back down ("demoted") to the HDD, to make room for more files to be promoted.

Many questions lingered, though, in the absence of any real technical info from Apple (and its Fusion Drive tech document provides very few hard details on the underlying functionality). Is Fusion Drive really a tiering technology, actually moving the data, or is it more of a caching solution? Does it rely on Intel's Smart Response Technology, which is available in Ivy Bridge chipsets like those in the new round of Fusion Drive-equipped Macs? Does it use the volume management features Apple introduced last year in Core Storage? Does it move whole files or just pieces of files? How does it keep track of what it's moving? Will it work on older Macs, or only newer Ivy Bridge Macs with Apple-provided SSDs?

BYO Fusion

Some of those questions are now answered. In the first of two blog posts, Jollyjinx sets out to build his own Fusion Drive using a 120GB OCZ Vertex 2 connected to his Mac's SATA bus and a USB-attached 750GB hard disk drive.

Core Storage, explained by Ars's John Siracusa in his OS X 10.7 Lion review, is used as the logical volume manager to tie the two physical devices together into a single volume group. Once the volume group is created, Jollyjinx creates a usable HFS+ volume inside of it. This is all accomplished using diskutil, the command line version of Disk Utility, since the graphical version doesn't yet support the necessary commands.

Surprisingly, no additional configuration was necessary for the volume to begin exhibiting Fusion Drive-like tendencies. Jollyjinx created 140GB of dummy files and directories on the volume using the dd command, and the system automatically placed about 120GB of those on the SSD before dropping the rest onto the HDD (easily observable by the drop in write speeds as dd's ouput was redirected from SSD to HDD). After the files were all in place, Jollyjinx then triggered a whole bunch of read activity on volume, using the dd input file flag to constrain the reads to the directories which had landed on the HDD.

By monitoring the throughput of both the HDD and SSD at the device level with iostat, it's possible to track what happens next. As soon as Jollyjinx stops the reads and the file system goes idle, the SSD lights up with write activity, sending about 14GB worth of writes from the HDD to the faster disk. After another hour of re-reading the same directories as before, they begin to show SSD read speeds instead of USB-attached HDD speeds.

Intel SRT does not handle writes this way—whether it's operating as write-back or write-through cache, SRT mirrors writes (immediately or within a short amount of time) down to the hard disk, which is not the observed behavior. Plus, as has been noted, SRT currently doesn't work with SSDs larger than 64GB. It is absolutely clear that Fusion Drive does not use SRT.

Based on these findings, Fusion Drive is indeed a base operating system feature, either contained within Core Storage or built into OS X 10.8.x (Jollyjinx notes at the bottom that he's using 10.8.2). It appears that Fusion Drive detects the SSD-ishness of a drive based on SMART info read across the SATA bus, though it's possible that Apple might be using Microsoft's SSD detection method and simply testing attached drives' throughput. If a Core Storage volume contains an HDD and an SSD, Fusion Drive appears to be automatically activated.

Block- or file-based?

Another question, though, is whether or not Fusion is "block" or file-based—that is, does it promote entire files, or merely promote the parts of files that are being referenced? The difference is important: if you have a 50GB Aperture library full of photos, for example, or a big multi-gigabyte virtual machine, will Fusion Drive promote the entire thing or just the parts of it that you're repeatedly reading?

Jollyjinx tackles this in his second post, again using dd to only read the first megabyte of several 100MB files located on the HDD side of his home-grown Fusion Drive. After giving Fusion Drive some idle time to work, telling dd to read the entirety of the 100MB files generates significant IO on both the SSD and the HDD—the first megabyte of each file is coming off the SSD, and the rest is coming off the HDD.

Clearly, Fusion Drive is operating at the "sub-file" level, which is good news. I had speculated that it was purely a file-based technology, which does have some advantages, but sub-file neatly works around the disadvantages that file-based tiering brings when working with very large files that exhibit high rates of change.

Also settled with this experiment is the question of timing. Fusion Drive behaves itself, waiting for uninterrupted idle time in order to do its tiering rather than stealing IOs away from the user while the system is active. It's not an instantaneous technology (nor should it be, since the user's reads and writes should always be prioritized over system housekeeping activities like this). There are still questions about the nature of the data movement—are the sub-file chunks promoted by being moved, or are they copied?—but the question is largely academic at this point, since even if the chunks' bits still exist on the HDD after being promoted to SSD, it's clear that their canonical location changes. This makes Fusion Drive fundamentally a tiering technology—not a cache.

We have many more questions about Fusion Drive, and we hope to get some answers soon. Our Fusion Drive-equipped Mac Mini has shipped and should be arriving within the next few days. We'll dive deep once it's here!

Promoted Comments

JollyJinx has further revealed that a DIY Fusion drive also works for a ZFS formatted volume. As JollyJinx and ArsTechnica have both established a Fusion drive works at the block level, easily evidenced by the fact that the command line dd utility generates expected behavior (dd is a block level copy command). Since it is now proven that non-HFS+ volumes work, it should in theory be possible to use with FAT32, ExFat, EXT3, or NTFS volumes, at least while running in OS X.

At this point I have not seen anything that would suggest it would work when natively booted in to Linux or Windows. In fact I would very much doubt that it would work since CoreStorage is an OS X only feature.

However Parallels running with a BootCamp volume rather than a disk image could in theory work.

Perhaps Lee could compare FAST (or other similar enterprise-y tech; I'm not trying to push EMC tech, I just happen to know more about it) to Fusion? I know that he has a lot of experience with large SAN stuff.

I was a presales engineer at EMC focusing on core storage for a couple of years, and before that I did storage architecture at a mostly-EMC shop (Boeing), so I'm pretty good on the ins & outs of FAST and FAST VP, actually

The primary differences are that big tiering solutions like FAST or Dell Compellent's Fluid Data Architecture are designed to work on systems with gobs of cache and gobs of IO ability. Plus, if I'm remembering right, FAST really works best if you've got three tiers instead of two (a fast but small SSD tier, a larger Fibre Channel or SAS middle tier for workhorse stuff, and a big SATA tier for archiving--FAST VP with just SSD & SATA is impractical, and FAST VP with just SSD & FC is too expensive).

Enterprise solutions also assume that you've got availability, support, and backup safety nets in place. You don't care about the effects of tiering on the SSD tier's write amplification, for example, because those SSDs are all under a maintenance agreement--the second one of them starts acting flaky, the vendor just shows up and replaces it. You've also got more layers of abstraction in the mix--it's not just one SSD and one HDD, but lots of HDDs and lots of SSDs, organized into pools or some other kind of logical construct, with volumes thin-provisioned out of those pools. Individual component failure isn't an issue.

The watermarks for determining how and when data are to be tiered are going to be fundamentally different, as will the actual granularity of tiering. FAST VP was doing its tiering in 1 GB chunks on VNX when I left EMC; the Symmetrix FAST VP flavor was infinitely more customizable and powerful, but still operated on...um...some number of 768 KB tracks, but I don't remember the exact chunk size.

Ultimately, Fusion Drive is a two-disk consumer solution designed to give some of the same benefits. It uses the file system and a logical volume manager to approximate a big enterprise tiering solution, but lacks many of the uptime-preserving things which make tiering an OK thing to use in the enterprise.

These types of new drive technologies especially those that run in software scare the crud out of me. I would be very cautious using these technologies on mission critical devices/servers/etc... I dont know if I really trust HFS to use LVM across multiple disks... HFS seems to have enough issues on a single disk...

These types of new drive technologies especially those that run in software scare the crud out of me. I would be very cautious using these technologies on mission critical devices/servers/etc... I dont know if I really trust HFS to use LVM across multiple disks... HFS seems to have enough issues on a single disk...

The Windows Home Server users who got bit by the drive pooling bug say hello.

These types of new drive technologies especially those that run in software scare the crud out of me. I would be very cautious using these technologies on mission critical devices/servers/etc... I dont know if I really trust HFS to use LVM across multiple disks... HFS seems to have enough issues on a single disk...

It's worth noting that this is essentially a consumer packaging of an enterprise SAN technology. While it certainly may have lost some reliability in the transition from seven-figure SAN racks to 4-figure consumer desktops, the technology itself isn't unreliable.

To add to my previous comment... A single spinning disk is actually quite reliable, a single SSD even more. But using both in tandem along with a software "controller" leaves many points of failure.

I would want to know how the system behaves if you remove the SSD from the mix or the HD from the mix, especially during writes, etc...

In a raid setting you gain reliability (obviously talking mirroring, or other redundant raid levels). But this is no different from my understanding then when people blasted the ultrabook that had 2 SSD drives in a stripped raid set. You have more points of failure... Even more here because you have your sata controller, and a software controller, LVM, etc...

These types of new drive technologies especially those that run in software scare the crud out of me. I would be very cautious using these technologies on mission critical devices/servers/etc... I dont know if I really trust HFS to use LVM across multiple disks... HFS seems to have enough issues on a single disk...

The Windows Home Server users who got bit by the drive pooling bug say hello.

LVM has been old hat in commercial Unix deployment for a long time. It was time tested in production environments long before most people ever heard about it.

Although I am skeptical that LVM has any relationship to how SSD caching schemes works. I don't think the authors experiments imply anything about how Fusion Drive works.

These types of new drive technologies especially those that run in software scare the crud out of me. I would be very cautious using these technologies on mission critical devices/servers/etc... I dont know if I really trust HFS to use LVM across multiple disks... HFS seems to have enough issues on a single disk...

It's worth noting that this is essentially a consumer packaging of an enterprise SAN technology. While it certainly may have lost some reliability in the transition from seven-figure SAN racks to 4-figure consumer desktops, the technology itself isn't unreliable.

It's not so much the reliability of the system, I think that is scaring people, it's what happens if one of the disks dies and you're left with unusable fragments on the disk that's still working. If you time machine frequently it shouldn't be too much of a hassle, but that would be my fear. Or even, if both disks are still working and you want to read the drives independently of the machine on which they were configured for whatever reason.

Data is moved from RAM to HDD all the time and we rarely bat an eye-lid, but Fusion drive is an unknown for many people.

No one has talked about what happens if one of the disks goes south while in use here. Maybe JollyJinx could run some tests where he rips out the USB drive to see what happens to the system. I suspect a major crash, and then the question is, how do you recover your data? Or your system?

No one has talked about what happens if one of the disks goes south while in use here. Maybe JollyJinx could run some tests where he rips out the USB drive to see what happens to the system. I suspect a major crash, and then the question is, how do you recover your data? Or your system?

That's on my list of things to try once I have a Fusion Drive-equipped computer in hand. I'll see if I can yank the SATA cable out of one of the devices while it's live.

Edit - assuming I can get to it. Need to look at a mini tear-down and see if it's possible. If not, I'll use dd to write some random data to an important part of the disk and break it that way.

It's not so much the reliability of the system, I think that is scaring people, it's what happens if one of the disks dies and you're left with unusable fragments on the disk that's still working. If you time machine frequently it shouldn't be too much of a hassle, but that would be my fear. Or even, if both disks are still working and you want to read the drives independently of the machine on which they were configured for whatever reason.

Data is moved from RAM to HDD all the time and we rarely bat an eye-lid, but Fusion drive is an unknown for many people.

Yep. I had the same initial response, but that initial response is conditioned by years of working with RAID.

Fusion Drive, I've come to grips with it. Whether it's the existing single-disk approach or Fusion Drive... either way, if one disk fails, you lose all your data. So make sure your backups work, and get on with life.

I'm curious how, or even *if* Fusion will work with Boot Camp. When you are booted into Windows, will you ever see the benefit of the Fusion Drive? Since this appears to be a feature driven by Mac OS features, I would imagine not. If this was a hardware implementation, it shouldn't matter. Furthermore, I'm curious how the Fusion Drive would appear in Windows. Will it appear as two separate drives? If so, could Windows 8's Storage Spaces be made to work with it (and would there be any practical benefit)?

To add to my previous comment... A single spinning disk is actually quite reliable, a single SSD even more. But using both in tandem along with a software "controller" leaves many points of failure.

I would want to know how the system behaves if you remove the SSD from the mix or the HD from the mix, especially during writes, etc...

In a raid setting you gain reliability (obviously talking mirroring, or other redundant raid levels). But this is no different from my understanding then when people blasted the ultrabook that had 2 SSD drives in a stripped raid set. You have more points of failure... Even more here because you have your sata controller, and a software controller, LVM, etc...

I really wouldn't sweat it - it also works on ZFS+ drives, which you know, is entirely software-driven.

To add to my previous comment... A single spinning disk is actually quite reliable, a single SSD even more. But using both in tandem along with a software "controller" leaves many points of failure.

I would want to know how the system behaves if you remove the SSD from the mix or the HD from the mix, especially during writes, etc...

In a raid setting you gain reliability (obviously talking mirroring, or other redundant raid levels). But this is no different from my understanding then when people blasted the ultrabook that had 2 SSD drives in a stripped raid set. You have more points of failure... Even more here because you have your sata controller, and a software controller, LVM, etc...

It will fail, since it's not a cache solution. Parts of files exist on the SSD, other parts on the HDD. Removing either disk would make the data on the other one useless just like a striped RAID set.

Backup often. Apple provides a simple solution for that. Sure, you increase the chances of failure, but you need to backup anyway. What you gain is a lot of performance.

Can I add a Windows partition?You can create one additional partition on the hard disk with Fusion Drive. You can create either a Mac OS X partition or a Windows partition.

If creating a Windows partition, use Boot Camp Assistant to create it, not Disk Utility. From the Go menu, choose Utilities. Then, double-click Boot Camp Assistant and follow the onscreen instructions. For more information on Boot Camp see the Boot Camp support page.

Note: Boot Camp Assistant is not supported at this time on 3TB hard drive configurations.

If it is as fault tolerant as it is easy to setup, you can bet I will being picking up an couple of SSDs on a black friday deal for my mac mini and '09 MBP.

Another Question: how does it work if you have more than just 2 drives of varying speeds? Like say you have one 500 Mbs SSD, a 300 Mbs SSD, and 7200 RPM HD; can you use it with more than just 2 speed teirs? And can it work with RAID setups?

I have been holding off on SSD for my mini because I woud need a large one, now I can buy a 128 throw the HD in a FIrewire case or thunderbolt and go to town. I wonder what the apple support options will be?

Can I add a Windows partition?You can create one additional partition on the hard disk with Fusion Drive. You can create either a Mac OS X partition or a Windows partition.

If creating a Windows partition, use Boot Camp Assistant to create it, not Disk Utility. From the Go menu, choose Utilities. Then, double-click Boot Camp Assistant and follow the onscreen instructions. For more information on Boot Camp see the Boot Camp support page.

Note: Boot Camp Assistant is not supported at this time on 3TB hard drive configurations.

This doesn't quite cover everything. From the sound of it, I'm guessing that you can't use Fusion Drive on any partition that needs to be accessible from Windows. So if you want to access your documents from both Windows and OSX (which, I'd imagine, most people who are dual booting would), no fusion drive for you. Unless of course Apple are planning to release Windows drivers, which I doubt.

To add to my previous comment... A single spinning disk is actually quite reliable, a single SSD even more. But using both in tandem along with a software "controller" leaves many points of failure.

I would want to know how the system behaves if you remove the SSD from the mix or the HD from the mix, especially during writes, etc...

In a raid setting you gain reliability (obviously talking mirroring, or other redundant raid levels). But this is no different from my understanding then when people blasted the ultrabook that had 2 SSD drives in a stripped raid set. You have more points of failure... Even more here because you have your sata controller, and a software controller, LVM, etc...

At best, the LVM does a copy job of the block, then deletes the source block after confirming it was written successfully to destination. Meaning you should be okay from a data integrity standpoint, even if a tool was needed to check blocks and determine where all the blocks are after a failure of some kind. At worst, it's like having a RAID 0 array and losing a drive. Boned. Run Time Machine to an external drive and you should be as fully protected against failure as 99% of the consumer population is, with or without Fusion.

One thing I want to confirm; the true Fusion drive is a single device, correct? Jollyjinx is running two separate physical drives, which is definitely more prone to failure (especially with an external USB device in the mix), but are we considering Fusion to be somehow less reliable when it's a single package that contains flash and spindle in one case? I'm sure the mixture has some impact on reliability, but is it expected to be a noticeable degradation in MTBF?

All the fear of the drive or software failing and loosing everything, If you are implementing 3-2-1 you should have no fear.3 Copies, 2 Backups, 1 Offsite. I think apple operates on a data safety factor of everyone uses time machine. If you are smart enough custom partition an SSD and Hard drive into a single fusion drive by command line... You better have good backup strategies.

However for standard users buying fusion, I wonder if there is a larger chance of failure just like Raid 0 (often called scary raid), multiplies your chances of catastrophic failure by a factor of two.

I am definitely interested in this technology. Since the beginning I've thought that hybrid SSD drives were just doing it wrong. I wondered, as many did, what happens if one drive fails.. but after some thought, it's not too hard to guess.

I suppose there is no difference in what happens in the event of a single drive failure.. you lose your data. What would happen if you just had the SSD and it failed? You would lose your data. What would happen if you just had a standard HDD? You'd lose your data. What happens if you have both and one fails? You lose your data if pieces exist on both drives. I think that when we see two drives, the techies inside us would like to think there is some level of redundancy. I suppose if you have a full file, all pieces, on one drive and the opposite drive failed, you would still be able to recover that file. It would probably have to be a seldom used file to not be split into pieces on the separate drives.

I think as geeks we see this solution and automatically spot a problem, which is the extra points of failure (software solution, two drives to fail, etc). We also see how the alternative, just a SSD for the OS and a 1TB storage drive then has its advantages.. in the event of one failure, you for sure at least have the other drive's content intact. On the other hand, as one commenter said above that this is a 4-figure implementation of a 7-figure SAN tiering solution, so it will not be without its caveats. Overall, it seems like a good solution.

The blog post doesn't isn't clear (or I fail at reading), but do I have to boot from the OS X install disc to run these commands? I see that it's unmounting disks1, which I assume is the system disk which leads me to think that I need to boot off the CD and then do this.

Any one done this yet?

EDIT:I had to boot up with the OS X installation disc. Following the instructions, I created a logical volume and I now have a fusion drive on my 2010 Macbook

JollyJinx has further revealed that a DIY Fusion drive also works for a ZFS formatted volume. As JollyJinx and ArsTechnica have both established a Fusion drive works at the block level, easily evidenced by the fact that the command line dd utility generates expected behavior (dd is a block level copy command). Since it is now proven that non-HFS+ volumes work, it should in theory be possible to use with FAT32, ExFat, EXT3, or NTFS volumes, at least while running in OS X.

At this point I have not seen anything that would suggest it would work when natively booted in to Linux or Windows. In fact I would very much doubt that it would work since CoreStorage is an OS X only feature.

However Parallels running with a BootCamp volume rather than a disk image could in theory work.

Lee Hutchinson / Lee is the Senior Reviews Editor at Ars and is responsible for the product news and reviews section. He also knows stuff about enterprise storage, security, and manned space flight. Lee is based in Houston, TX.