Tag Archives: ckpt

Where did all my savvol space go? I noticed last week that some of my Celerra replication jobs had stalled and were not sending any new data to the replication partner. I then noticed that the storage pool designated for checkpoints was at 100%. Not good. Based on the number of file system checkpoints that we perform, it didn’t seem possible that the pool could be filled up already. I opened a case with EMC to help out.

I learned something new after opening this call – every time you create a replication job, a new checkpoint is created for that job and stored in the savvol. You can view these in Unisphere by changing the “select a type” filter to “all checkpoints including replication”. You’ll notice checkpoints named something like root_rep_ckpt_483_72715_1 in the list, they all begin with root_rep. After working with EMC for a little while on the case, he helped me determine that one of my replication jobs had a root_rep_ckpt that was 1.5TB in size.

Removing that checkpoint would immediately solve the problem, but there was one major drawback. Deleting the root_rep checkpoint first requires that you delete the replication job entirely, requiring a complete re-do from scratch. The entire filesystem would have to be copied over our WAN link and resynchronized with the replication partner Celerra. That didn’t make me happy, but there was no choice. At least the problem was solved.

Here are a couple of tips for you if you’re experiencing a similar issue.

You can verify the storage pool the root_rep checkpoints are using by doing an info against the checkpoint from the command line and look for the ‘pool=’ field.

nas_fs –list | grep root_rep (the first colum in the output is the ID# for the next command)

nas_fs –info id=<id from above>

You can also see the replication checkpoints and IDs for a particular filesystem with this command:

fs_ckpt <production file system> -list –all

You can check the size of a root_rep checkpoint from the command line directly with this command:

Need to quickly figure out which checkpoint filesystems are taking up all of your precious savvol space? Run the CLI command below. Filling up the savvol storage pool can cause all kinds of problems besides failing checkpoints. It can also cause filesystem replication jobs to fail.