Explain Snapshots

This seems to be a popular search term so I think it’s worth covering off. This is covered on my old top post about Fractional Reservation, but I’ll cover the alternatives here also.

NetApp snapshots used to be pretty unique in the industry, but the industry term for this technology is generally now Append-on-Write / Redirect-on-Write (new writes are appended to the “end”, or redirected to free blocks, depending how you look at it) and quite a few vendors do it this way. Put very simply, all new data is written to new (zeroed) blocks on disk. This does mean that snapshot space has to be logically in the same location as the production data, but that really shouldn’t be a problem with wide-striping / aggregates / storage pools (pick preferred vendor term). When a snapshot is taken, the inode table is frozen and copied. The inode table points to the data blocks, and these data blocks now become fixed. As the active filesystem “changes” blocks, these actually get written to new locations on disk, and so there is no overhead to the write (the new blocks are already zeroed). In other technologies (not NetApp) this also forms the basis of automated tiering, once data is “locked” by a snapshot, it’ll never be over-written so it can safely be tiered out of SSD or even SAS as read performance is rarely an issue. NetApp use FlashPools to augment this, and a snapshot is a trigger for data to be tiered out of FlashPools as it’ll never be “overwritten”.

Other/Traditional vendors (although this list is rapidly shrinking) take a Copy-on-Write approach. This means that once a snapshot is taken and the used blocks are “locked” in, any overwrite has to copy the data to a new location (1 write), zero the production data (1 write) and then write the new data (1 write), as such CoW generally has a 3x overhead to over-write performance. However one advantage here is that snapshots can then be stored in a totally different location, and the layout of snapshots and production data can be much more tightly controlled. This is why the traditional enterprise arrays (DS8000, HDS VSP, EMC VMAX) still do things with Copy-on-Write. Their arrays are so ridiculously quick anyway the overhead to CoW is generally negligible.

There is another option and that is journalised. I mention this method as it’s what VMware does (although clearly not a storage vendor). Ignoring VMware VSAN, I’m talking about traditional VMware snapshots of a VMDK. The snapshot locks the written data of the VMDK file (as in NetApp snapshots) and then all following writes are written to a journal. Actually a few different technologies use this journal mechanism although in quite different ways (EMC RecoverPoint and Actifio both use journals, but in a very different way). The problem with VMware journal snapshots is when you delete it, at this point all the writes that happened in the journal file need to be re-written to the base VMDK, all at the same time as production writes still going to the disk. This can create a massive performance problem.

Append-on-Write / Redirect-on-Write is now employed by Dell Compellent, HDS HUS, EMC VNX, HP 3PAR, IBM V7000 and many others, including most start-ups. Its definitely the most efficient way of taking snapshots and many vendors will claim theoretically no limits to he snapshot capacity.

Downsides?

If you are too aggressive with your snapshot schedules, the system is always copying the inode tables, always having to manage a large number of snapshots, and never get a chance to do housekeeping. In order to AoW/RoW you need free blocks, to get free blocks the system needs chance to do housekeeping. Generally this involves a disk scrub of some sort (going to each data block, checking if any snapshot points at it, if not zero the data and free it up). I’ve seen busy storage systems not free up deleted space for 24 hours+. It can also affect data locality performance, writes will no longer be located near the same data blocks, and unless your storage system is specifically tuned to handle this sort of load (as most storage systems should be) then this can create a performance problem. Even if your storage system is tuned, performing system maintenance to rebalance the data (either manually triggered, or automatically by the appliance) can have significant performance gains for sequential workloads. Additionally the more free space you have, the quicker your storage system will be as it doesn’t have to “seek” for the next free block. A full system has to “seek” more to find free spaces and may be forced to break-up the data into smaller chunks in order to fill in the holes.

Don’t overlook housekeeping and don’t real system maximums as a design target! Keep well within the limits of the storage system (whatever vendor) and the storage system will look after you.

One final (very important) point. Snapshots are not backups! You need to get your data off the storage system, in a format that can be recovered without the need of the original storage system in order to call it a backup. Don’t rely too heavily on snapshots and consider what is acceptable RTO/RPO if you lose your entire storage system (snapshots and all).