vSphere 5.0 Storage Features Part 1 – VMFS-5

One of the primary objectives of Storage enhancements in 5.0 is to make the management of storage much simpler. One way to do this is to reduce the number of storage objects that a customer has to manager, i.e. enable our customers to use far fewer and much larger datastores. To that end, we are increasing the scalability of the VMFS-5 filesystem. These scalability features are discussed here. In future postings, I will discuss further features which aim to fulfil this vision of simplifying storage management.

VMFS-5 Enhancements

Unified 1MB File Block Size. Previous versions of VMFS used 1,2,4 or 8MB file blocks. These larger blocks were needed to create large files (>256GB). These large blocks are no longer needed for large files on VMFS-5. Very large files can now be created on VMFS-5 using 1MB file blocks.

Large Single Extent Volumes. In previous versions of VMFS, the largest single extent was 2TB. With VMFS-5, this limit has been increased to ~ 60TB.

Smaller Sub-Block. VMFS-5 introduces a smaller sub-block. This is now 8KB rather than the 64KB we had in previous versions. Now small files < 8KB (but > 1KB) in size will only consume 8KB rather than 64KB. This will reduce the amount of disk space being stranded by small files.

Small File Support. VMFS-5 introduces support for very small files. For files less than or equal to 1KB, VMFS-5 uses the file descriptor location in the metadata for storage rather than file blocks. When they grow above 1KB, these files will then start to use the new 8KB sub blocks. This will again reduce the amount of disk space being stranded by very small files.

Increased File Count. VMFS-5 introduces support for greater than 100,000 files, a three-fold increase on the number of files supported on VMFS-3, which was ~ 30,000.

ATS Enhancement. This Hardware Acceleration primitive, Atomic Test & Set (ATS), is now used throughout VMFS-5 for file locking. ATS is part of the VAAI (vSphere Storage APIs for Array Integration), and will be revisited in a future posting. This enhancement improves the file locking performance over previous versions of VMFS.

Here is a vmkfstools output of a newly created VMFS-5 volume showing many of the new scalability characteristics:

VMFS-5 upgraded from VMFS-3 continues to use the previous file block size which may be larger than the unified 1MB file block size.

VMFS-5 upgraded from VMFS-3 continues to use 64KB sub-blocks and not new 8K sub-blocks.

VMFS-5 upgraded from VMFS-3 continues to have a file limit of 30720 rather than new file limit of > 100000 for newly created VMFS-5.

VMFS-5 upgraded from VMFS-3 continues to use MBR (Master Boot Record) partition type; when the VMFS-5 volume is grown above 2TB, it automatically & seamlessly switches from MBR to GPT (GUID Partition Table) with no impact to the running VMs.

VMFS-5 upgraded from VMFS-3 continue to have its partition starting on sector 128; newly created VMFS5 partitions will have their partition starting at sector 2048.

RDM – Raw Device Mappings

There is now support for passthru RDMs to be ~ 60TB in size.

Non-passthru RDMs are still limited to 2TB – 512 bytes.

Both upgraded VMFS-5 & newly created VMFS-5 support the larger passthru RDM.

Misc.

I decided to add this section as I know many of you will have questions about it.

The maximum size of a VMDK on VMFS-5 is still 2TB -512 bytes.

The maximum size of a non-passthru (virtual) RDM on VMFS-5 is still 2TB -512 bytes.

The maximum number of LUNs that are supported on an ESXi 5.0 host is still 256.

These enhancements to the scalability of VMFS should assist in the consolidation of more VMs onto less datastores, reducing the number of storage objects that an administrator has to manage, and in turn making storage management that little bit easier in vSphere.

Recommendation

If you have the luxury of doing so, I would recommend creating a new VMFS-5 filesystem rather than upgrading VMFS-3 to VMFS-5. Storage vMotion operations can then be used to seamlessly move your VMs to the newly created VMFS-5. This way, you will enjoy all the benefits that VMFS-5 brings.

Bad Dos

Well, in 5.0, the theoretical maximum number of powered on virtual machines that we document/support per VMFS volume is 2048. However, there are many considerations to take into account when scoping the number of VMs per datastore such as the capabilities of the underlying storage & the IOPS and latency requirements for the VMs themselves. So all of these would have to be taken into account before a VM:datastore ratio could be calculated.

Chris

July 14th, 2011

Fantastic! Almost cant wait to get started on upgrading! 🙂

anjaneshbabu

July 16th, 2011

Interesting – how disk size has been increased to 60TB without change in VMDK size. This coincides with the move to GPT disks rather than MBR which perhaps means that the data starts spanning VMDK files once the disk size exceeds 2TB(is this a band-aid patch ?). Wonder what happens to our existing SAN – this would perhaps need to be re certified for VMFS5 …

Hi Anjaneshbahu, as you mention, the VMDK size is still 2TB -512 bytes & we need GPT to address larger partition sizes. I’m not sure I follow your comment about data spanning VMDK files. I guess the only way this would happen is if multiple VMDKs were presented to a VM, and the Guest OS used some software RAID technology. However this is a function of the Guest OS, and would not use any feature of the VMkernel to achieve this. I’m not sure about the requirement to recertify your SAN. The best thing to do is to check the HCL when 5.0 releases.

Hi Paul, are you referring to the experimental vmfs-undelete script that appeared in ESX 3.5 as per http://kb.vmware.com/kb/1007243?
If so, then the answer is no. This script was unsupported in ESX 4.x, and is also not available in ESXi 5.0.

Singh

July 29th, 2011

So what is the point of VMFS5?
I can get VMFS3 to address 20TB of space already by just creating multiple 2TB disks within my server’s RAID controller and expanding them within the ESX Client.
I mean, if the VMDK sizes are still limited to 2TB, the only major point to VMFS5 is that I don’t have to create multiple 2TB virtual disks within the hardware RAID controller anymore. All that to save 10 minutes of extra work?
I was hoping to grow my VMDKs beyond 2TB… disappointing…

Dan

July 30th, 2011

Cormac Hogan, the difference is no more messy extents and ability to structure with fewer LUNS, … which would be desirable if you’ve adopted tiered storage that’ss being rolled out by most vendors.

Dennis

Hi Dennis,
There are no tools shipped with the ESXi, but there are internal tools available to diagnose datastore issues. If you suspect that you have a damaged datastore, open a Service Request with GSS for assistance & diagnosis.

Dennis

August 12th, 2011

only for diagnose or also for repair? Since we lost data caused by a corrupt vmfs we are using only VRDMs for all Data.

Hi Dennis,
sorry to hear about you experience but yes, GSS has expertise to diagnose and repair in certain scenarios, but obviously their ability is limited. It all depends on the type of issue.

Nati

August 17th, 2011

So, what is the proper way to get larger then 2TB partition in a VM (Windows)? I’m concerned about performance. Was hoping to be able to introduce 8TB partition to few vm’s, but I’m not sure what is the best way to do it avoiding all sort of pitfalls like vm os corruption will leave the volume useless..
Also, some of my current VMFS-3 stores are very low on space and some warnings are showing in vSpher. I wonder if they will upgrade to VMFS-5, or there are different free space requirements that will fail the upgrade.

Hi Nati,
You have two choice here – multiple 2TB VMDKs assigned to the same VM, and using the Guest OS Volume Manager to build an 8TB volume, OR, you can use pass-thru RDMs passed directly into the Guest. PT RDMs can now be much larger than before, as mentioned in the blog.
Good question on the amount of free space. In order for upgrade from VMFS-3 -> VMFS-5, you need at least 2 free file blocks and 1 free inode. If these are not available, the upgrade will not succeed.

JD

September 23rd, 2011

Are queue lengths (file locking) still an issue with VMFS-5? Have always had issues with multiple high-performance VMs in the same datastore queuing up commands and making overall performance suffer for everything in the datastore.. (Which is why we went to NFS)

Hi John, thanks for commenting.
Have you looked at the Storage I/O Control feature which has been in vSphere since 4.1? This addresses exactly the issue you describe by allowing you to prioritize VMs and assigning each VM a certain amount of bandwidth to each datastore when contention arises.

Blair

October 4th, 2011

The big thing for me about the new vmfs 5 is the changes to the block size. We had a major meltdown of our vphere 4 system. Our vendor configured our SAN raid controllers to 256Mb stripes and 8MB block size to support the 2TB vmdk. This resulted in a massive read overhead. Each 8MB block takes up 6.4 stripes. With our SAN we had 6 disks that had to be scanned 6 time with 2 being scanned 7 times. This is just under 40 scans to access a single 8MB block. We were getting read latency in the seconds!!! I am hoping the 1MB block size of vsphere 5 will alleviate this while still allowing the 2TB vmdk.
Our problem was the interaction of the larger than normal vmfs block size with the underlying raid stripe size and the unintended consequence to disk latency.
Moral of the story: small configuration can make HUGE impacts.

jmayes

January 20th, 2012

VMFS3 allowed a maximim of 8 systems to mount a volume concurrently. Has this limitation been changed in VMFS 5? Nad, secondly, can VMFS5 be used on VMFS 4.1 ESXi, or does the system have to be upgraded to ESX5 first?

This limitation has not changed in VMFS5/vSphere 5.0. However it is high on our agenda to address in a future release, since it has a direct impact on both our View & vCloud Director products. Both of these products use linked clones for provisioning VMs and would benefit from a higher number of hosts sharing a file.
On your second point, VMFS5 volumes are only recognised by ESXi 5.0 hosts. ESX hosts which wish to use VMFS5 will need to be upgraded to ESXi 5.0. vCenter will not allow you to upgrade a VMFS3 to VMFS5 unless it detects that all hosts access the datastore are running ESXi 5.0.

tom miller

January 25th, 2012

Chogan,
Thanks for the info.
Getting ready to upgrade a client from ESX4 to ESX5 including storage and new esx servers. I was going to zone one new esx5 server to see the VMFS3 with a 8MB block factor and the new storage with VMFS5. Use that server as a swing box to storage vMotion VM’s from VMFS3 to VMFS5. Is there an issue Storage vMotioning a VM setting on a VMFS3 8MB block factor to a VMFS5 1MB block. In the past with VMFS3 you could not Storage vMotion a VM to VMFS3 with a smaller block factor.

Wally

February 18th, 2012

Just checking,
We upgraded a 2047GB (4MB blocksize) VMFS3 volume to VMFS5. We grew the lun to 3072GB on our storage system. All connected ESXi5 hosts see the lun has grown but none of them wants to extend the volume. Also extending it with vmkfstools –growfs doesn’t work. Are we doing something wrong or is this an unmentioned limitation?

Wally,
You should certainly be able to grow the upgraded VMFS-5. Rather than troubleshooting this issue in the comments, please file an SR with our support folks who will be able to help you with this issue.
Cormac

Christoph Herdeg

February 20th, 2012

Hi there…
If the max. VMDK size can only be 2TB minus (!) 512 bytes, why t** f*** does the “add disk wizard” not simply USE 2TB minus 512 bytes instead of trying to set 2TB minus nothing,nill,NULL, and, of course, failing?
This is such an annoyance, guys. I have to create such VMDKs sometimes, perhaps once a month. So please: I don’t want to enter 2TB in MB minus 1MB (btw.: 2047999 MB) every single time. If that wizard can set 2TB to a filesystem that does not support 2TB, so why can I enter 2TB???
CHANGE THAT!

Christoph,
Which version of vSphere are you using?
I just tested this with my 5.0 environment, and it would appear that the wizard creates a VMDK of 2TB – 1MB.
The actual size of the VMDK created is 2199023254528.
If we take 2TB to be 2*1024*1024*1024*1024, this gives us 2199023255552.
This is 1024 greater than the maximum size of the VMFS created (2199023254528).
So the size limitation is indeed taken into account.
Cormac

Ward

February 27th, 2012

Hello,
I was wondering what if you had 2 datatsores one running vmsf3
(with all hosts) and a new datstore with 5, before you can storage vmotion from 3 to 5 do you have to upgarde 3? or can you go from 3 to 5?
thanks.

Ward

February 28th, 2012

Hello,
We are currenlty running a datatore at vmsf3 and will be creating a new vmsf 5 store, we want to storage vmotion all hosts off the 3 version to 5, do we need to upgarde the old vmsf 3 before we can do this? or will svmotion be able to move the data from vmsf3 to vmsf5?
thanks!!

thfs

vSphere 5.0 (and earlier) still only allows a VM host 45 RDM LUNs (SCSI0:0-0:15 must be virtual hard disks, while SCSI1:0-3:15 can be RAW LUNs).

Derek

August 1st, 2012

Is there a minium VM machine version for vmfs-5?

Martin Smith

August 2nd, 2012

ok, this sucks… you introduced a new filesystem and the main limitation is exactly the same…
THIS:
“The maximum size of a VMDK on VMFS-5 is still 2TB -512 bytes.”
SHOULD BE THE FIRST LINE IN THIS ARTICLE.
the second line could be something like:
We failed, but will try better with VMFS6

scott

August 19th, 2012

With regards to feedback on >2TB vmdk. I also posted in the blog Cormac mentioned in the last post

Another example of needing vmdks larger that 2TB. Backup Exec 12 Deduplication store. You are only allowed one store per Server. I need to backup about 15TB of data. In my primary datacenter I have a SAN so I can present a 15TB RDM LUN but in our offsite disaster site I only have DAS in the ESXi Server and while i can create a 15TB Datastore from the 15TB Raid set I still need to create 7 2TB vmdks and Extend them to create one disk using Windows Disk Management. Not ideal.

Tony

September 19th, 2012

After upgrading the datastore to 5, can we expand the vmdk size (block size of 1mb) past the 256GB? In ESXi 4.1, if you didn’t change the default block size from 1mb, it limited the size of the disk for the vm. Can we expand this by simply upgrading the datastore to 5?

is the only way to “freshly format” a vmfs LUN partition to “delete” the LUN on the SAN and then re-add it? vmkfstools doesn’t want to do anything with it as it is “busy”. I can’t unmount it either. I have removed all vm’s and all files from the datastore. I went ahead and ran “upgrade to vmfs5”. It is vmfs5, but it still is 8MB blcksize.

the datastore contain 12 2Tb-vmdk. today i added 2 2Tb-vmdk to one virtual machine and try start other vm. but esxi said me:
——————
Reason: 0 (Cannot allocate memory).
Cannot open the disk ‘/vmfs/volumes/502a3daf-232efbdc-2018-5cf3fcb9b28e/archive/archive_3.vmdk’ or one of the snapshot disks it depends on.
——————-

I read vmfs5 can use to 60Tb by 32 2Tb vmdk. Is it true or vmfs5 limited 25Tb for max 2Tb vmdk?

There are some projects going on internally to see what can be done to overcome this limitation.

I recommend speaking to a support representative to get the latest advice on this matter.

DH

April 5th, 2013

Now that you are not allowed to create a volume with anything but 1 MB block size, how do you shrink a thin-provisioned VMDK if your environment is all VMFS5? We made the (mistake) of migrating off all of our VMFS3 volumes after upgrading all of our hosts to ESXi 5.1 forgetting that one of our LUNs was intentionally 2 MB block size on the old setup so that we could use it for shrinking guests that had ballooned; now there doesn’t seem to be any way to do this.

For those not familiar with the process, say you have a 100 GB guest that only has 10 GB of real data in it but due to something going wrong, temporary files, clean-up performed, etc., that 100 GB had been used at one point and now only 10 GB is used. Well, SAN storage is expensive and you would like the wasted 90 gigs back. In the ESXi 4.1 and VMFS3 world, you could create a volume with a block size that was different than the volume the guest currently lives on. In our environment all of our LUN’s were 1 MB except for one we kept at 2 MB just for this purpose. When we were getting a bit low on space and found guests like the one in question, we’d run a simple command to write zeroes to all of the free space:

cat /dev/zero > bigfile ; rm -f bigfile

Then all you have to do is storage vMotion the guest to the 2 MB volume and back to its normal home and suddenly the VMDK goes from 100 gigs down to 10 gigs.

Without support for creating a file system with a different block size, now I have no way to reclaim this space in a manner that doesn’t require an outage for the guest.

Ken

May 3rd, 2013

This is a question I had as well and hope that there is a good answer for an alternative to this. With proper vmware tools support, You’d think this feature would have been added to the core product already.

Ken

May 3rd, 2013

Actually nevermind. from the host connected to the datastore, you can run vmkfstools -K (or –punchzero) ./volume.vmdk after zeroing out the free space in the OS. This removes the zeroed space for you, leaving you with a thin provisioned lean vmdk. VM has to be off when you do this. I haven’t found an equivalent in the GUI but this is easy enough to do. NO DATASTORE MIGRATIONS NEED TO HAPPEN AT ALL FOR THIS TO WORK. 🙂

After upgrading the datastore to 5, can we expand the vmdk size (block size of 1mb) past the 256GB? In ESXi 4.1, if you didn’t change the default block size from 1mb, it limited the size of the disk for the vm. Can we expand this by simply upgrading the datastore to 5?

This is a question I had as well and hope that there is a good answer for an alternative to this. With proper vmware tools support, You’d think this feature would have been added to the core product already.