In this post, the last of this series, I will discuss the manner in which a snapshot can
consume so much space that it will cause writes to the active file system to
fail, as well as the mechanisms which NetApp and EMC have created to avoid this fate.

Yes, it is true. You can get an ENOSPACE error when you are
using a metadata approach for creating snapshots, which is the way WAFL manages snapshots on a NetApp filer. Recall a couple of
posts ago, when I included this diagram:

Note that the additional blocks required by the snapshot are
invading the free space in the active file system. It is actually the light-colored
blocks (the “before” images of the blocks) which are held by the snapshot. At
NetApp, we used to have debates over whether the snapshot occupied the space, or
whether it was the active file system that did so. Whatever. The effect is
exactly the same. The storage space cost of a snapshot is equal to the number
of blocks which have been updated since the creation of the snapshot. Thus, you
can think of the storage space overhead of snapshots in this way:

From this diagram, you see that we are running a file system
that is about 70% full. We have another 10% of snapshot overhead. This creates
a file system which has about another 15% before it runs out of space.

Absent space reservations, you could do this:

All available space has now been fully occupied by snapshot storage
overhead, even though there has been no increase in the amount of data in the
active file system. This is because we kept this snapshot around too long: A
sufficient number of blocks were updated after creating the snapshot to exhaust
all empty space. The next write to this file system will get an ENOSPACE error. This includes updates to files already in the active file system, that require no additional space to be allocated.

Hence the common NetApp heuristic: “Old snapshots are dear; new
snapshots are cheap.”

This was a depressingly common issue at NetApp while I was
there, particularly with storage administrators who migrated to NetApp NAS from
a more traditional SAN storage environment (typically EMC). Those folks would
behave like good storage professionals: They would utilize all available space.
They regarded free space as wasted space. Further, these folks tended to think
that if they had created an Oracle datafile of 100 GB in size, then that file was
locked down and in place. They regarded a storage device returning of an ENOSPACE error as a result of an update to that file as naughty, irrational, and strange.

For these well-behaved storage professionals, the good habits
they had developed in the SAN context were a formula for disaster when dealing
with NetApp snapshots in an NAS context. By running with little or no free
space, they allowed no headroom for the snapshot overhead. Thus, ENOSPACE
errors were common.

I used to refer to snapshots as having a “dark side”. This is
the dark side I was talking about. The space allocated to a datafile is no
longer guaranteed. When you make a snapshot, you can run out of space on that
file anyway, although it is already allocated in the file system.

This led NetApp to introduce the notion of space reservations.
The architect of this concept was Bruce Gordon, the SAN marketing guy hired by
Rich Clifton during the 2000 to 2001 period. I will readily admit that I
fiercely resisted this concept. Basically, what space reservations do is
simple. If there are not enough free blocks in the file system to completely
duplicate all of the existing data, then the snapshot creation fails. An
illustration will help. Before space reservations, if you had this:

You could not create a snapshot at all. You do not have enough
free space to duplicate the existing data. You must either free some space or
add capacity. Assuming you add capacity then at this point, you could create a
snapshot:

Snapshot overhead then begins to invade the reserved space.
As you begin to accumulated updated blocks, the snapshot overhead looks like
this:

Since you have reserved enough space to duplicate all of the
data that existed at the time of the creation of the snapshot, theoretically an
ENOSPACE error is impossible.

I said previously that I resisted this concept. I used to tell
Bruce Gordon that as far as I was concerned, he was an EMC plant. Why? Because
space reservations destroy the one primary benefit of snapshots: Space
efficiency.

Go all the way back to my first post on this series. I stated
that the gold standard for Storage Layer Instantaneous Copy (SLIC) technologies
is BCVs. BCVs have lots and lots of advantages. They have absolutely no
performance penalty. They work beautifully. They have only one downside: They
require another set of disks. Before space reservations, snapshots did not. By providing the same basic
functionality as BCVs (instantaneous copy) without the storage overhead of
another set of disks, snapshots became the best way to do the job of Oracle database instantaneous hot backup.

With space reservations, the cost of snapshots became effectively
the same the same as BCVs. In that case, BCVs win. They do not have the
performance issues that metadata based snapshots do. (This performance trade-off is discussed in detail in Part 2 of this series.) Removing the cost advantage of snapshots over BCVs was a major erosion
in NetApp’s core value proposition.

But, as Bruce Gordon said, “No customer will ever have an
ENOSPACE error on my watch.” Bruce attempted to establish a principle that space
would always be reserved such that a snapshot could never exhaust the active
file system free space.

Unfortunately, FlexClones, covered in detail in my previous
post, violate this principle. That is because FlexClones create another write
thread. Remember that each write thread has the potential to double the space
requirements, by overwriting every block in the snapshot. That was
illustrated by the following diagram from my previous post:

Note how FlexClone increases the space requirements by
adding another set of “after” image blocks to the mix. Simply reserving space
for one set of additional blocks is now insufficient. You would now need to
reserve space for two. Thus FlexClones make the following scenario possible:

You are now out of space again. The next write will get an
ENOSPACE error.

EMC snapshots make all of this impossible. By using a reserved
LUN pool approach, EMC simply allocates the space required for the snapshot.
The snapshot space is not shared with the active file system space. Thus, it is
impossible for the active file system to receive ENOSPACE from a snapshot. The
following graphic illustrates this:

The snapshot space is contained within the RLP. It is not
shared with the active file system. Running out of space within the RLP will
cause the snapshot to become invalidated. But it will not affect the active
file system at all. An ENOSPACE error can never be returned to the active file system
with this design, unless the user exhausts the space in the active file system itself. Further, you decide how much space you want to allocate to the snapshot. Unlike WAFL-based snapshots, you are not writing a blank check for snapshot overhead, up to the full amount of data in the active file system. Rather, you can decide that the snapshot will only be allowed to take up 10% of that space if you want to. This adds discipline to the whole proposition of snapshot space overhead.

Once again, it is for you as the customer to judge the relative
merits of these approaches. In my series on snapshots, I have attempted to
bring clarity to the debate between EMC and NetApp on the benefits and risks of
snapshots for Oracle database backup. Based upon the number of comments this
series has received, I think you are hearing me.

Future posts on this blog will cover how EMC NAS compares to
NetApp NAS for Oracle database storage.

Comments

Hi Jeff,

Excellent piece of a very insightful information on both EMC's and NetApp's snapshot technology. I must say that it is straight talking,and hits the nail on the spot.

I was impressed with your no-BS style when I first met you at one of the Fall Classics in NetApp. I probably met you once or twice. Now that I am at EMC, I am glad to see you dishing out good and honest stuff on both NetApp and EMC.

All of this is Greek to me so this post is not going to have any relevant comments regarding your subject matter. (And actually, I'd prefer that you not authorize this)

I followed you here from LinkedIn after I read your comments about fear, and I'm writing to let you know I've put your blog in circulation thru Stumbleupon so that other tech-savvy people may come across it. Your testimonials show you know what you're talking about, and I---well, I'm hoping to help you get more exposure.

Your considerations are valid although a bit biased from an Oracle point of view. Fortunately not all the IT shops run on Oracle and for unstructured data NetApp filers are somehow reversing your long dissertions and considerations. But of course, admittedly, this is the wrong blog for not Oracle data! Cheers

Fantastic series. We have both EMC and NetApp. The SAP databases are using EMC with FC but, unfortunately, all my Oracle databases are using NetApp with NFS. However unfortunate it is, I do love it and have not encountered major performance issues and it makes failovers to other machines simple and in seconds. I support databases from 10gb in size to above 1TB. For most Oracle databases I have turned off the snapshot facility, and I depend on exports and traditional hot backups. For some small databases (less than 1TB in size), I allocate the entire database files, including control files and redo log files, on the same qtree. I then use snapshots of "open" database files. Although these are "dirty" snapshots, I can still restore from these -- Oracle does its own automatic crash recovery. However, since the Storage Group does not want to give me qtrees nor volumes bigger than 1TB, I'm forced to split a large database into two or more qtrees or volumes. Now, I cannot trust the "dirty" snapshots for restoring/recovering a database. I want to use snapshots with the Oracle "hot" backup technology for these large databases, but cannot find precise syntax on how to do this. I did find an example but, it was using SnapVault ... and we are only licensed to use SnapShot and SnapRestore. Can you help me on this? Thanks for your postings. Following is the NetApp extract for backing up a database (taken from NetApp Best Practices document):
Step 4: Create Oracle hot backup script enabled by SnapVault.
Here is the sample script defined in “/home/oracle/snapvault/sv-dohot-daily.sh”:
#!/bin/csh -f
# Place all of the critical tablespaces in hot backup mode.
$ORACLE_HOME/bin/sqlplus system/oracle @begin.sql
# Create a new SnapVault Snapshot copy of the database volume on the primary
filer
rsh -l root descent snapvault snap create oracle sv_daily
# Simultaneously 'push' the primary filer Snapshot copy to the secondary
NearStore system
rsh -l root rook snapvault snap create vault sv_daily
# Remove all affected tablespaces from hot backup mode.
$ORACLE_HOME/bin/sqlplus system/oracle @end.sql
Note that the “@begin.sql” and “@end.sql” scripts contain sql commands to put the database’s tablespaces
into hot backup mode (begin.sql) and then to take them out of hot backup mode (end.sql).

-------------------------

Response:

Ramon:

Thanks for your kind comments on the blog. The issue you are pointing out has to do with the lack of consistency technology (sometimes referred to as consistency groups) on NetApp's arrays. EMC provides this on all array types other than Celerra at this point. (I will not comment on the record about whether Celerra will have consistency groups in a future release. You can come to your own conclusions on that. :0) Consistency groups have been discussed in my blog. With this feature, you can take dirty snaps, clones or such across multiple volumes. That way, you do not have to store all of your datafiles, tempfiles, controlfiles and online redo logfiles in the same volume, which is not in any way best practices compliant.

In terms of going into hot backup mode, I have written many white papers on combining this feature of Oracle with snaps in the past, including many at NetApp. My current program publishes these as well, and the latest version can be found here:

"Note that the additional blocks required by the snapshot are invading the free space in the active file system. It is actually the light-colored blocks (the “before” images of the blocks) which are held by the snapshot. At NetApp, we used to have debates over whether the snapshot occupied the space, or whether it was the active file system that did so. Whatever. The effect is exactly the same."

The effect is not the same. NetApp provides a means for reserving snapshot area or not reserving snapshot area and allowing it to expand into data area.

If you correct the perception of the goals of snapshots, they still have the advantage.

Thanks

-------------------------------

Response:

BCVs provide read performance advantages: You have double the disks to read from. Further, obviously BCVs also provide instantaneous recovery. In the context of the sentence, the advantage of snapshots over BCVs is space efficiency. If you take away that advantage, BCVs win in my book.

"But, as Bruce Gordon said, “No customer will ever have an ENOSPACE error on my watch.” Bruce attempted to establish a principle that space would always be reserved such that a snapshot could never exhaust the active file system free space."

"Further, you decide how much space you want to allocate to the snapshot. Unlike WAFL-based snapshots, you are not writing a blank check for snapshot overhead, up to the full amount of data in the active file system. Rather, you can decide that the snapshot will only be allowed to take up 10% of that space if you want to. This adds discipline to the whole proposition of snapshot space overhead."

NetApp space reservations are a selectable option. In fact, reserved space can be reserved by % and does not have to be all or nothing like with BCV's. Snapshots do not suffer the multiple I/O's of BCV's. That is why NetApp confidently allows up to 256 snapshots where EMC best practice does not allow that many BCV snapshots due to performance degradation.

True, but misleading. EMC has been moving in the direction of snaps and away from BCVs for a long time. Certainly, we are doing so in the context of my program. My comments are limited to a comparison of which snapshot technology is best: metadata-based, or reserve LUN pool-based. BCVs are beyond the scope of the discussion.

disclaimer: The opinions expressed here are my personal opinions. I am a blogger who works at EMC, not an EMC blogger. This is my blog, and not EMC's. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.