Wednesday, January 04, 2006

Clone OS - ESX Server

* Don't forget uuid.action = "keep", especially if this is a Win32 system.

The added step for what goes on below is that you'll need to move the VM "personality data" (.vmx, .nvram, etc...) using scp (secure copy, aka copy over SSH) to the remote target system.

I can't recall offhand if I posted this before, so... What follows is a detailed process of how one would clone (and in the process relocate) a VM.

The "uuid.bios" value is used in some cases by the Guest OS (via a DMI/SMBIOS query to the VM's BIOS) as part of determining its unique identity. You should only allow the "uuid.bios" value to change in cases where you wish to trigger a change in personality. In Microsoft Windows-ese, this would be when you intend to run things like SysPrep or NewSID on the Guest OS.

The "uuid.location" value is used to determine the uniqueness of a VM at the VMKernel level, and will change any time the VM's .vmx is renamed or relocated. The "uuid.bios" and "uuid.location" values start off identical, but may differ at some later point in time (per the above conditions).

*****

The basic steps involved in cloning a VM on a standalone ESX Server chassis are as follows: 1) Produce "master" VM image From this point on, leave the "master" VM image in a powered off state to preserve it.

2) Copy "master" to "clone master" Preferrably, make a copy of the .dsk file, and create a new directory for copies of the .vmx and nvram files. Adjust the .dsk pointer and description data as appropriate in the .vmx file. ADD the following line to the .vmx file:

uuid.action = "keep"

This will force the "uuid.bios" value to remain consistent, even if the VM's .vmx file has been relocated or renamed.

It is equivalent to answering the UUID question with "Keep Always".

3) Register "clone master" VM

vmware-cmd -s register /path/to/clone/file.vmx

4) Depersonalize the VM as appropriate for the Guest OS (Win32 SysPrep, etc...)

5) Copy "clone master" for deployment

Make a copy of the .dsk file, and create a new directory for copies of the .vmx and nvram files. Adjust the .dsk pointer and description data as appropriate in the .vmx file. REMOVE the following lines from the .vmx file:

Ethernet0.generatedAddress = "....."

Ethernet0.generatedAddressOffset = "....."

uuid.location = "....."

uuid.bios = "....."

uuid.action = "keep"

6) Register new VM

vmware-cmd -s register /path/to/new/file.vmx

7) Power on new VM from the MUI directly

A fresh "uuid.location" and "uuid.bios" set will be populated into the .vmx file at this point. These values are then used as part of the hashing algorithm to generate the new ethernet MAC for the VM.

The SMBIOS inside the VM will have a "system serial number" with the new UUID as well, allowing repersonalization as a new unique entity.

**

Steps involving a "clone master" as a copy are to preserve the "master" VM in case errors are encountered during depersonalization.

*****

Note, you may want to also set and/or adjust two other bits of data:

suspend.directory = "....."

checkpoint.cptConfigName = "....."

These two values are concatenated together in order to determine a unique location to store the .vmss VM Saved State file during a "Suspend" operation. The default value (if unset) for "suspend.directory" is the VMFS2 Volume where the lowest numbered virtual disk is stored (scsi0:0 typically). The default value (if unset) for "checkpoint.cptConfigName" will be an arguably-unique hash based on the Virtual Machine's "displayName" variable and ten hexadecimal digits.

*****

If you find yourself in a situation where you cannot "Suspend" an already running VM due to disk-space constraints based on the "suspend.directory = *" mapping in the .vmx configuration file, a workaround exists:

While the VM is running, query for current suspend.directory state:

vmware-cmd /path/to/your/VM.vmx getconfig suspend.directory

Determine where you have sufficient space in a VMFS2 volume to provide for the .vmss VM Saved State file (it will be slightly larger than max mem allocated to that VM).

Suspend your VM. Wait until the suspend operation has completed successfully. Detach the VMware Remote Console session(s) for this VM at this time.

Adjust the "suspend.directory" in the .vmx configuration file to match the new value used above.

Reconnect to your VM using the VMware Remote Console and resume the VM at your leisure. Note, reversing to the original "suspend.directory" value requires another suspend/resume cycle, using the same procedure as above.

*****

*****

*****

Relocating a VM from one directory to another (be it on the same physical chassis or not) is reasonably straightforward. The big caveats are that you should be well aware of the risks (re: UUID warnings) and to reconfirm the configuration and device mappings for the VM if it is being moved between chassis.

A single example snippet of shellscript is included below. This example assumes that 'root' will own all VMs. Optimally you should use a 'role account' (or VMware VirtualCenter) rather than 'root' on the ESX Server, so you can better manage access controls / permissions across your pool of Virtual Machines. Details on structuring 'role accounts' for running in a non-VirtualCenter environment follows below.

*****

*****

The process for splitting .VMX files off into individual directories in your environment is as follows:

* Shut down the VM you are going to move. Perform a clean, Guest-OS level shutdown, letting the VM power down completely.

* Run the bit of shellscript /as root/ on the Console OS for that particular VM (see bottom of email). You will be prompted for the root PW twice (unregister, register). A result code of "1" from the unregister and register commands means "success". The shellscript makes a permissions-matched copy of the configuration (.vmx) file, rather than moving it.This allows you to maintain a "clean" copy of the .vmx file in case your migration process is interrupted before the unregister/register cycle is completed.

* Start up the VM from the /MUI/ using the Green Triangle "play" button. You will see the VM's status change to "Waiting for Input...".

* Click on "Waiting for Input...", and answer "Always Keep". This will fix the UUID data in the .vmx file to reflect the new location for the configuration file (uuid.location=...), but will -NOT- change the UUID/SMBIOS data for the VM on the Guest OS side (uuid.bios=...). This is precisely the behavior that you want.

* Confirm that the VM has started up correctly. You'll notice that each VM now has 'vmware.log' and 'nvram' files in their respective directories.

* Optionally, delete the original .vmx file from the original location.

I'll provide you a real world example so you can see how it gets put to use:

First, this example assumes you are using the regular Unix permissions model, with users and groups defined locally on the Console OS.

You have one SuperUser (root). This user should not be used to add Virtual Machines. It may be used to delete Virtual Machines, and make system-wide changes. Unix model: SuperUser.

You have (at least) one 'role' account which handles addition/modification/deletion of the _Virtual Machine_ layer. This 'role' account is rarely used.

Unix model: read-write-execute-own. You have several users who are allowed to control VMs owned by their group's role account as though it were a system on their desk. They can change power state, connect/disconnect devices, etc.... They may not however modify the hardware connected to that VM, which includes modifying what the CDROM and Floppy drives point at. Unix model: read-execute-group.

If you add the 'write' permission as well to the .vmx file, they can modify the VM. You may also have a third category of everyone else (people outside of that group) who are allowed to check system configuration/health data on the Web Management Interface, but are not allowed to make any changes (power state, devices, etc...). Unix model: read-only-other.

This follows the standard Unix model of: User Group Other (root is a special case)

Assuming your shared space is in /home/vm/...

Create a 'role' account (ex: vmadmin). The command is all one line, I've wrapped it for readability. In this example, role type accounts are in the 5xxx range for UID/GID.

Add the control users (ex: susan, james, terry). You'll disable their shell access unless they need direct access to the Console OS side. In this example, regular user accounts are in the 6001+ range for UID/GID.