This issue isn’t specific to Jetstress, Exchange, Microsoft, or a specific fabric type, storage protocol or storage vendor. Exceeding the virtual disk capacities listed above, per host, results in the symptoms discussed earlier and memory allocation errors. In fact, if you take a look at the KB article, there’s quite a laundry list of possible symptoms depending on what task is being attempted:

An ESXi/ESX 3.5/4.0 host has more that 4 terabytes (TB) of virtual disks (.vmdk files) open.

After virtual machines are migrated by vSphere HA from one host to another due to a host failover, the virtual machines fail to power on with the error:vSphere HA unsuccessfully failed over this virtual machine. vSphere HA will retry if the maximum number of attempts has not been exceeded. Reason: Cannot allocate memory.

Adding a VMDK to a virtual machine running on an ESXi/ESX host where heap VMFS-3 is maxed out fails.

When you try to manually power on a migrated virtual machine, you may see the error:The VM failed to resume on the destination during early power on.
Reason: 0 (Cannot allocate memory).
Cannot open the disk ‘<<Location of the .vmdk>>’ or one of the snapshot disks it depends on.

The virtual machine fails to power on and you see an error in the vSphere client:An unexpected error was received from the ESX host while powering on VM vm-xxx. Reason: (Cannot allocate memory)

A similar error may appear if you try to migrate or Storage vMotion a virtual machine to a destination ESXi/ESX host on which heap VMFS-3 is maxed out.

Cloning a virtual machine using the vmkfstools -icommand fails and you see the error:Clone: 43% done. Failed to clone disk: Cannot allocate memory (786441)

While VMware continues to raise the scale and performance bar for it’s vCloud Suite, this virtual disk and heap size limitation becomes a limiting constraint for monster VMs or vApps. Fortunately, there’s a fairly painless resolution (at least up until a certain point): Increase the Heap Size beyond its default value on each host in the cluster and reboot each host. The advanced host setting to configure is VMFS3.MaxHeapSizeMB.

Let’s take another look at the default heap size and with the addition of its maximum allowable heap size value:

After increasing the heap size and performing a reboot, the ESX(i) kernel will consume additional memory overhead equal to the amount of heap size increase in MB. For example, on vSphere 5, the increase of heap size from 80MB to 256MB will consume an extra 176MB of base memory which cannot be shared with virtual machines or other processes running on the host.

Readers may have also noticed an overall decrease in the amount of open virtual disk capacity per host supported in newer generations of vSphere. While I’m not overly concerned at the moment, I’d bet someone out there has a corner case requiring greater than 25TB or even 32TB of powered on virtual disk per host. With two of VMware’s core value propositions being innovation and scalability, I would tip-toe lightly around the phrase “corner case” – it shouldn’t be used as an excuse for its gaps while VMware pushes for 100% data virtualization and vCloud adoption. Short term, the answer may be RDMs. Longer term: vVOLS.

Updated 9/14/12: There are some questions in the comments section about what types of stoarge the heap size constraint applies to. VMware has confirmed that heap size and max virtual disk capacity per host applies to VMFS only. The heap size constraint does not apply to RDMs nor does it apply to NFS datastores.

Updated 4/30/13: VMware has released vSphere 5.1 Update 1 and as Cormac has pointed out here, heap issue resolution has been baked into this release as follows:

VMFS heap can grow up to a maximum of 640MB compared to 256MB in earlier release. This is identical to the way that VMFS heap size can grow up to 640MB in a recent patch release (patch 5) for vSphere 5.0. See this earlier post.

Maximum heap size for VMFS in vSphere 5.1U1 is set to 640MB by default for new installations. For upgrades, it may retain the values set before upgrade. In such cases, please set the values manually.

There is also a new heap configuration “VMFS3.MinHeapSizeMB” which allows administrators to reserve the memory required for the VMFS heap during boot time. Note that “VMFS3.MinHeapSizeMB” cannot be set more than 255MB, but if additional heap is required it can grow up to 640MB. It alleviates the heap consumption issue seen in previous versions, allowing the ~ 60TB of open storage on VMFS-5 volumes per host to be accessed.

When reached for comment, Monster VM was quoted as saying “I’m happy about these changes and look forward to a larger population of Monster VMs like myself.”

I wouldn’t worry about it either in today’s host density. An additional 176MB of overhead per host is a non-issue. However, Monster VMs (now my friend on Facebook – who’s responsible for that account anyway?) could see much larger memory overhead values and on a per VM basis. And while it doesn’t matter how many Monster VMs fit per host, at the end of the day they are still going to require cluster resources to satisfy their individual overhead entitlement values.

That’s a really good question. The KB article solely mentions VMFS but at the same time it calls out often that it’s an open virtual disk per host issue which would apply across the board. I will do some more digging and see if I can an answer on this.

Great article. I too am curious if this also affects NFS. I have several 5.0 host that have more than 8 TB of storage open on them and I haven’t had any issues yet… But we are running NFS, so I wonder if NFS is not affected by this.

Anyways, thanks again for a great article and pointing this out to the community.

There are some questions in the comments section about what types of stoarge the heap size constraint applies to. VMware has confirmed that heap size and max virtual disk capacity per host applies to VMFS only. The heap size constraint does not apply to RDMs nor does it apply to NFS datastores.

Gah, this is a serious problem in my eyes. I have a DRV program that requires up front partitioned space (thick) when on block. With the space retention requirement the client has, the disk requirement is 45tb. A single 45tb drive is preferred. In some storage platforms rdms are not a option, and multiple vmdks would be the only choice, then requiring a very large dynamic disk within windows.

I have Agree with Roger this is good thing to know. I just ran in to this issue in a SQL POC. Wanted stress test the hosts with fauiled host.. Last 5 VM’s could not be moved becuse of heap error. We are all VMDK’s and all block storage at this time.

This certainly is disconcerting, especially since backing up VMs requires mounting VMDKs to a backup virtual appliance. I wonder if, by snapshotting, and mounting the disks to the backup appliance doubles the amount of “open vmdk file space” for a given vmdk.

Scalability seems to have always been low on the VMware priority list, in my opinion.

Is there any reason “not” to just max out the heap size? I presume there’s a reason VMware set a smaller value. Is memory usage, the only con to increasing it? Meaning, is there any performance overhead with increasing the setting?