My general stuff. This will generally include computer related information, but may also contain general rantings on my part.

Tuesday, April 21, 2009

Celerra NFS and VMware testing

We just recieved a EMC Celerra NS-G8 and it is my job to implement NFS serving VMware. Beyond the standard "get away from vmfs", there were a few features that peaked my interest: thin provisioning and deduplication.

Thin provisioning was a big letdown. If you are using NFS, you have some degree of thin provisioning by default. Additionally, most any VMware function that touches the disks (StorageVMotion, cloning, deploy from template) will bloat the vmdk to full size. I did find a way to thin out the VMs (I called it a treadmill process), but its not seemless and requires hours of downtime. I still have this feature enabled, but dont expect great things from it.

Deduplication was a bit of a surprise to me since I didnt think this feature was available until I got the system. My previous experience with deduplication was with EMC Avamar, which is block-level deduplication that allows for over 90% deduplication rates. Celerra deduplication however is file-level, meaning only fully duplicate files are freed up.I have worked with Exchange and Windows Single-Instance-Storage before, so this is a great item for file servers where the same file may exist dozens or hundreds of times, but no 2 VMDKs are ever going to be alike.

Celerra Deduplication however also does compression, something that may be very useful if it can compress the zero blocks in a VMDK. To test this I created a "fat" vmdk and copied it to a NFS datastore, then initiated the dedupe process and identified the size differences.

Step 1: Create the bloated VMDKThe first thing needed is to create a bloated/fat/inflated disk to test against

SSH into the VMware host

CD to /vmfs/volumes/

Create the disk vmkfstools -c 50G foo.vmdk -d eagerzeroedthick

The size of the disk can be confirmed by executing ls -l, and by viewing it in the Datastore Browser, make sure both locations list it as a full 50G in size (to ensure that thin provisioning isnt effecting us)

Step 2: Change the dedupe parametersBy default, deduplication is limited to files that meet the following requirements:

Havn't been accessed in 30 days

Havn't been modifed in 60 days

Is larger than 24kb

Is smaller than 200MB

To test the dedupe process, we need to change these using the server_param command. To see the current settings, ssh into the Celerra and run server_param server_2 -facility dedupe -list. This will list all the deduplication settings, the settings can then be changed by running server_param server_2 -facility dedupe -modify <attribute> -value <value>. In my case I need to reset the access and modified times to 0, and maximum size to 1000.

Step 3: Initiate the dedup processEvery time a filesystem is configured for deduplication, the dedupe process is triggered - meaning we can start a dedupe job manually by telling the filesystem to enable deduplication (if that makes sense). There are 2 ways we can do this - via the web console, or via the command line.

To kick off a dedupe job via the web console, browse to the File Systems node and open the properties for the target file system. In the File System Properties page, set Deduplication = Suspended and click Apply. The set Deduplication = On and click Apply. As soon as dedupe is set to on, a dedupe job will be initiated.

To kick off a dedupe job via command line, ssh into the Celerra and run fs_dedupe -modify -state on. This will automatically start a deduplication job. To view the status of the job, run fs_dedupe -info

Step 4: Compare the resultsInitiating a dedupe on a file system with only VMDKs ultimatly results in 0 gain. Even with the disks being completely blank, compression doesnt seem to come into play - meaning a big waste of time in testing it.

Additional testing of dedupe with other files (ISOs, install files, home folders, etc...), show that dedupe works properly on the file level, but not for VMDKs.

2 comments:

The problem with Celerra de-duplication/compression is that even if your VMDK was optimized by it - if you try to start the VMDK afterwards and cause any change in the VMDK file - the whole VMDK will have to be decompressed to its full, un-optimized form before you can do any I/O to the file.. Celerra white-paper states 8MB/sec decompression..which means - forever - in typical VMDK sizes. http://www.emc.com/collateral/hardware/white-papers/h6265-achieving-storage-efficiency-celerra-wp.pdf