I was recently involved in a Disaster Recovery rehearsal. The idea behind this was to prove that we could recover our key systems at another site should a disaster occur. We came across an interesting issue which I thought I would blog about in case it is of help to anyone who may also encounter it in a similar situation. Let's face it, in a disaster recovery scenario, you need as few difficult issues to deal with as possible..!

This issue is only really likely to affect virtual servers (provided you are using identical hardware to recover your physical ones that is).

We were using IBM Tivoli Storage Manager (TSM) to restore C: drive and System State backups of virtual servers (taken in the live environment) into a separate isolated network.

The process involved creating a new VM (typically from a template), with the same virtual hardware version, number of vCPUs, amount of memory, disk layout and virtual NIC type as the live server. This VM would have the same operating system version, edition, architecture (x86/amd64) and service pack installed as the live server, plus the TSM client to facilitate the restore.

On restoring the first Windows 2003 server in this manner, the server wouldn't boot. The VM console displayed a black screen (no error message) and the VM CPU usage immediately spiked to 100% and stayed there. This happened immediately after power on, so it was not possible to get the VM to respond to the F8 key in an attempt to put it into safe mode, to assist with troubleshooting.

It really looked like a hardware incompatibility, but I'd been very careful to make sure the necessary parameters matched, so was a bit mystified. After some head scratching and time spent comparing the hardware that Windows thought was present in the template and live VM (good job it wasn't a real disaster!), I spotted that the Hardware Abstraction Layer (HAL) didn't match between the two (the HAL can be checked via Device Manager – Computer). The template VM had an ACPI Uniprocessor HAL (which I expected as it had been built with only one vCPU), but the live VM had an ACPI Multiprocessor HAL. I was pretty sure at this point that this would be the cause of the issue.

Like the template VM, the live VM also had one vCPU, so why did it have a multiprocessor HAL? The key difference between the two VMs was how they had been created. The live VM had originally been created by P2V'ing a physical server. This physical server would no doubt have had multiple processors and hence when Windows was originally installed, had been given a multiprocessor HAL. This didn't change on P2V, but the person who did this elected for the VM to only have one vCPU - quite understandably as it was a relatively lightly loaded server. So it was running a single vCPU with a multiprocessor HAL (which is clearly a valid configuration).

The problem was introduced by the restore process. I assume that some aspect of the restore didn't replace something in the template, and part of the template VM's uniprocessor HAL was still operational after the restore and reboot – or not operational in fact!

The supported/correct way of setting a multiprocessor HAL would be to install the OS from scratch on a VM with more than one vCPU. However, that would have been time consuming for the number of variations of servers that we had to restore, and the time available didn't allow for that.

So how did I rectify this? Well, a few years ago I ran into a (different) issue attempting to give a singe vCPU VM an additional processor, and in the process of troubleshooting that, came across this post on ngohq.com by Squall Leonhart. This describes how to change the Windows HAL without reinstalling. Note that this approach is, obviously, totally unsupported!

The method involved using DevCon, which is basically a command-line version of Device Manager. DevCon can be downloaded from Microsoft here.

By running the following commands within the template VM, prior to the restore, I was able to update the HAL to an ACPI Multiprocessor one:

Squall recommends rebooting twice after doing this, to ensure that the device and IRQ tables get updated correctly.

After performing the steps above, a subsequent TSM restore was successful, and the server booted with no further problems. Result!

I have reproduced Squall's entire post below as this is such useful information which could be lost should the ngohq.com forum cease to exist for any reason – which would be a massive shame. It includes information on how to go to and from various HALs. Thanks for this incredibly helpful information Squall!

Heres some tips for upgraders!

You require the Devcon utility for this, unpack it to a folder, then navigate to the folder its in using Command prompt (command prompt on context menu PowerToy is handy for this)

How to enable APIC without repair installing windowsin device manager you will notice that under computer type it says Advanced Power and Control Interface PC.. this is a standard single processor HAL driver without APIC. to upgrade to the APIC driver you input the following:

after this, enable APIC in the bios if you haven't already, and reboot twice so windows can update the device and irq tables, it should now say ACPI Uniprocessor PC in the device manager

How to go back to PIC if you wish to go back to PIC from APIC enter this: devcon sethwid @ROOT\ACPI_HAL\0000 := +acpipic_up !acpiapic_updevcon update c:\windows\inf\hal.inf acpipic_up

and reboot twice to update the device and IRQ tables, and then disable APIC in the bios (the reason is, if you disable APIC before the device and irq tables update, windows will crash at startup.

How to Update from a Single Core APIC compatible cpu to a Multicore APIC compatible cpu

under the computer entry in the device manager, you will see it says ACPI Uniprocessor PC, to update to the multiprocessor HAL input this: devcon sethwid @ROOT\ACPI_HAL\0000 := +acpiapic_mp !acpiapic_up devcon update c:\windows\inf\hal.inf acpiapic_mp.

Then reboot twice again to update the device and IRQ tables.

How to go back to Single Core (should it be needed) if you accidentally burn your processor and have to go back to a single core backup, you input this into the devcon: devcon sethwid @ROOT\ACPI_HAL\0000 := +acpiapic_up !acpiapic_mp devcon update c:\windows\inf\hal.inf acpiapic_up.