Thursday, October 15, 2009

NetApp mbrscan and mbralign for Virtual Machine Alignment In-Depth

Alignment of VMware virtual machines has been an issue for quite some time. This issue exists no matter who is the storage vendor is but I will use NetApp because it is what I know. Here are some links to get you up to speed in case you don't fully understand the situation:

What has NetApp done about the situation? I hear that NetApp will be releasing a tool that plugs into vCenter specifically for vSphere 4.0 and vSpherei (ESXi) 4.0 shortly. In the meantime, if you are still on ESX 3.5 (or vSphere with a Service Console) there is another answer that you can use today. This will not work on ESXi since you need a service console for the tools.

Eric Forgette at NetApp created a set of tools about a year ago that has matured and found its way into the NetApp VMware Host Utilities Kit version 5.1. If you do not have this loaded on your ESX/vSphere servers and you are connecting to NetApp, go load it NOW! (A reboot is required for the settings to take affect) Most of the following is research I have conducted in conjunction with conversations directly with Eric over at this thread on NetApp Communities.

Included in this tool are two executables, mbrscan (scans a vmdk for alignment) and mbralign (performs the alignment). The default installation location is /opt/netapp/santools.

While the readme does a good job of going over the basics, there are a number of caveats to run the tools correctly. I will go into each executable in depth but before I go down into the weeds, you need to know when NOT to run it!

You can not or do not want to use the alignment tool for the following:

Windows 2008 Server is aligned if the machine was created as a Windows 2008 server. If the machine is upgraded from Windows Server 20003, it will not be aligned.

Citrix Servers are not supported because they remap the c:\

Windows Dynamic Disks are not supported and will be corrupted if an alignment is performed (but mbrscan will detect them - see below)

Linux LVM volumes are not supported (mbrscan may NOT detect all LVM partitions)

Windows Server 2003 non-boot disks that have been added (d:, e:, etc) will need to be remapped in Computer Management. The drive letter will be lost on alignment.

GRUB booted Linux and Solaris will need to have GRUB reinstalled after alignment

With all that out of the way, it is a basic two step process: 1. run mbrscan, 2. run mbralign on machines as needed.

Step #1 - mbrscan

In order for the mbrscan to give reliable results, the machine must either be powered off or have a VMware snapshot!

I have a simple script that I put together that just takes a VMware snapshot on all machines on an ESX/vSphere host.

I then execute mbrscan using the scan all virtual machines parameter: mbrscan --all

After I have the scan results I need I execute another script to remove the VMware snapshots I just created for all machines on the host

In order to perform an alignment, ALL VMware snapshots MUST be removed!

In order to perform the alignment, you NEED an amount of free space equal to the size of the vmdk. mbralign will make a backup of the file as the first step. This file have an extension: -mbralign-backup.

In order to perform the alignment, the virtual machine MUST be powered off!

If the virtual machine has multiple vmdk's, only one can be aligned at a time!

Execute mbralign against the vmdk - I usually get about 1-2GB per minute speed

Boot up the virtual machine. If it works, delete the -mbralign-backup file

On Windows systems it will ask you to reboot one more time because it detects the hard disks as new hard disks

If it doesn't work, run mbralign again and it will detect the -mbralign-backup and ask you if you would like to restore the file. Very Nice!

25 comments:

Eric Forgette
said...

Hi Aaron, Thanks for putting that post together. I think its a great resource. While I can't comment on specifics, 'shortly' is probably overly optimistic with regard to a plugin. You can however use mbralign and mbrscan on ESX 4.0. In fact the NetApp VMware Host Utilities Kit is fully supported on ESX 4.0.Thanks again for all your work on this!Cheers,-Eric

I am getting "Device busy" message after taking the snapshot for a VM for mbrscan. Does the VM have to be shut down after taking a VM? I took the snapshot from vSphere client with and without "Snapshot VM's memory" and "Quiesce guest file system", and it didn't seem to help out. Once the snap shot is taken, you still run mbrscan against the *-flat.vmdk file, correct?

This "Device busy" messages also comes up with the version 5.2 of the mbralign tool.

Could I also see the contents of the simple script to take snapshots you are talking about too?

@anon - There might be a Powercli script out there but I haven't seen one. (I really need to play with PowerCLI someday, just no time right now)

@masa That is interesting. Yes, you either need the vm powered off or you need a VMware snapshot for the virtual machine or you will get the resource busy error. Are you sure that the VMware snapshot isn't timing out?

You are correct, you scan the flat file.

If you installed the tool with the NetApp VSC or the HUK version 5.1 or greater NetApp will provide support on the tool now as well. You may give them a shot or also try the NetApp Communities link listed in the article. I haven't tried in awhile but maybe Eric (original author of the tool) may be able to help you out a little more. Good Luck!

Oh, about the scripts. I see if I can dig them up and I'll post them on a separate post later this week.

I finally figured out what I was doing wrong. Our environment has many ESX hosts with several shared storage accessible by all of these ESX hosts, and this mbrscan or mbralign --scan requires to be run against the *-flat.vmdk file on the ESX host which is hosting the VM's configuration file, .vmx. I don't remember reading this in documentation, but I probably missed it somewhere. I was initially thinking mbrscan could be run on any hosts to any shared *-flat.vmdk files, but that doesn't appear to be the case. Once I determined the correct ESX hosts with vmware-cmd -l, I was able to get mbrscan running for VMs that were currently running with snapshots. I plan on scripting the rest tomorrow.

I recently encountered an issue that alignment "appears" to have failed on a W2K8 VM. I hear W2K8 should be aligned automatically, so I assume this VM may have been upgraded from W2K3. So when I run mbralign command, it goes through all the process, and it says it completed ok. I don't get any error messages. However, when I power on the VM, the console says "Error loading operating system". Somebody was talking about driver letter change, and I removed the data disk.. so it should only have C:. Still the same error. The documentation says "Windows VM SHOULD load", but in case it doesn't, it doesn't give you anything to try like the Linux GRUB ones... The question then is "How do you recover the aligned Windows VM that does not boot?". Any info would be appreciated. Thanks!

I was still having some issues aligning Windows VMs (basic disk), and found the following:

http://communities.vmware.com/message/1638108

""Error loading operating system" after running mbralign from EHU 5.2 on a VMDK"

Duh!!

"DESCRIPTION:Under some circumstances, the mbralign utility that is packaged with the EHU 5.2and VSC 2.0 may incorrectly calculate the Cylinder/Head/Sector information usedby the bootloader. When this occurs, the resulting GOS will be left in a statethat is not bootable."

The ones I was having issues in particular was WindowsXP VM. The EHU 5.2 seemed to work on Windows 2003 VM when I tried.

So, the current recommendation may be to just use mbralign from EHU 5.1...

Hoping you could help. I am testing out the restore feature of the mbralign tool but it doesn't look like it's working for me. I can successfully align a disk on esx 3.5) and also create the backup file. However, when I run the align tool again (with the backup file in place) it doesn't prompt me if I want to restore. It just ends with the message saying Please remove or rename all files ending in -mbralign-backup..

Ah - So that is a relatively new version I believe. I'm sorry but I haven't worked with the tool in a long time now (I no longer have access to NetApp gear). I'm wondering if they didn't pull it out for some reason.

All the tool did was take the vmdk file and made a copy and stuck a backup extension on it. When you executed the tool it would check for the presence of this file and prompt you if you wanted to restore or not.

Would I would suggest is to make a copy manually your self before the alignment and then you have it if you need it and you can delete it all goes well.

Of course, this mean you need 100% disk space overhead which may cause issues but better safe than sorry!

It was VERY rare but I did have one or two that did corrupt so I would always recommend a backup in some way.

thanks for your reply. Here's how I made it work - I manually renamed the backup vmdk to the original vmdk from the ssh session to my host. And as you said, all that the tool does is takes a copy and slaps a backup extension to the copy, the manual method I did to copy it back to the original file name just worked. :-)..

In our environment, we have a Vm named abc and ithas 3 vmdks, naming abc-flat.vmdk, disk1.vmdk and disk2.vmdk.

While performing mbralign for the disk1 and disk2 i am getting the belo message and not able to run the alignment.Please help me out.Message:The same vmdk was found in multiple vmx files.Please give explicit path to vmdk file location

Sorry, I personally haven't seen that error message before so I can't tell you which way to go with that one.

I would check the directory that contains the vmdk's to see if you have multiple .vmx files for some reason. The only reason I could think of is if the machine was renamed or copied and for some reason you have more than .vmx file. That usually isn't the case.

The .vmx file that matches the virtual machine name currently in use should be the one you are looking for but again, no guarantees! Good Luck!