All Snapshots Are Not Created Equal

Over the past decade or so, snapshots have become a standard feature of disk arrays, volume managers, file systems and even PCI RAID controllers. The pitch from vendors of these products is pretty much the same: "With our technology you can take a snapshot in just a second or so and it will hold only the changed blocks, taking up much less disk space than a full copy." While that statement may typically be true, there are big differences in snapshot implementations.

Over the past decade or so, snapshots have become a standard feature of disk arrays, volume managers, file systems and even PCI RAID controllers. The pitch from vendors of these products is pretty much the same: "With our technology you can take a snapshot in just a second or so and it will hold only the changed blocks, taking up much less disk space than a full copy." While that statement may typically be true, there are big differences in snapshot implementations.

When you are considering snapshot technology, ask these questions of your vendor:1. What technology do you use? Copy-on-write, redirect-on-write, Black Magic?2. How will my performance be affected by having 5 snapshots of a volume? 20?3. Do I have to dedicate space to snaps ahead of time?4. How much space does the first snapshot take?5. What is the block granularity of your snap technology?

The biggest difference to consider is the underlying technology. When an application writes to a disk using the most common snapshot technology, copy-on-write, the snapshot provider copies the contents of the block being overwritten to a new location in the snapshot file. Copy-on-write requires three I/Os for a write-to-read the current contents of the block, write the new data and write the old data to the snapshot. Redirect-on-write snaps, from NetApp's Write Anywhere File Layout (WAFL) and ZFS, among others, write the new data to free space on the disk and update the file system, or volume, metadata to include the new block in the current data set and the old one in the snapshot.

Where copy-on-write requires three I/Os per write, redirect-on-write, like a system without snaps, still performs only one. Note that both techniques require metadata updates, but they're not significant to system performance as they're almost always cached. Copy-on-write snaps can usually be sent to a different RAID set or disk tier while redirect-on-write snaps usually have to be in the same tier as the running data.

Then there are VMware's log file snapshots. These store the block changes in a log file, freezing the original data in the virtual machine disk (VMDK). While this technique creates snapshots quickly and can be space-efficient, it means that all disk I/O must check each snapshot in turn to see if it has the latest version of the disk block. VMware snaps can significantly slow down your system if you keep them around too long or create more than two or three snapshots of a VM.

Howard Marks is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage ... View Full Bio