Search This Blog

Windows 2008 R2 Failover Clustering and ESX

We ran into an issue the other day trying to configure a 2008 R2 Failover cluster on 2 DL 585 G1 servers (Opteron 875, 4x dual core processors.) These servers didn't have native support from Microsoft for the RAID card within the OS. So we looked at the idea that we needed to purchase new hardware for this SQL 2008 DEV / TEST cluster initiative.

I came up with the brain child to virtualize the hosts to ESXi and run the two sides of the clusters removing the dependency on the hardware. So we charged down this path to implement a Windows 2008 R2 server failover cluster:

installing the ESXi server - flawless

installing 2008 R2 - flawless

installing the failover cluster - failed.

What? Two areas fail during the testing mechanism, one is the lack of HBA's in the servers (no problem), the second confused us, SCSI-3 Persistent Reservation. Validation failed because it reports that "putting PR reserve on cluster disk 0 was successful when it should of failed." It recommends to check the configuration of the storage to allow it to function properly for failover clusters.

and it talks about running a failover cluster in vmware workstation under an freeNAS iSCSI target. Hmmmm, would that work?

Downloading openfiler as I type...

-------------Update-10/13/2011-------------------

Over a year later and hopefully a whole lot smarter. Openfiler didn't support SCSI-3 reservations and quickly understand that without that Clustering will never work. SCSI-3 reservations have to do with how the LUN or disk is locked when a computer is talking to it.

Here is good breakdown from Symantec's site:

SCSI-3 persistent reservations

SCSI-3 Persistent Reservations (SCSI-3 PR) are required for I/O fencing
and resolve the issues of using SCSI reservations in a clustered SAN
environment. SCSI-3 PR enables access for multiple nodes to a device and
simultaneously blocks access for other nodes.

SCSI-3 reservations are persistent across SCSI bus resets and support
multiple paths from a host to a disk. In contrast, only one host can use
SCSI-2 reservations with one path. If the need arises to block access
to a device because of data integrity concerns, only one host and one
path remain active. The requirements for larger clusters, with multiple
nodes reading and writing to storage in a controlled manner, make SCSI-2
reservations obsolete.

SCSI-3 PR uses a concept of registration and reservation. Each system
registers its own "key" with a SCSI-3 device. Multiple systems
registering keys form a membership and establish a reservation,
typically set to "Write Exclusive Registrants Only." The WERO setting
enables only registered systems to perform write operations. For a given
disk, only one reservation can exist amidst numerous registrations.

With SCSI-3 PR technology, blocking write access is as simple as
removing a registration from a device. Only registered members can
"eject" the registration of another member. A member wishing to eject
another member issues a "preempt and abort" command. Ejecting a node is
final and atomic; an ejected node cannot eject another node. In VCS, a
node registers the same key for all paths to the device. A single
preempt and abort command ejects a node from all paths to the storage
device.

Confused yet?

End of the story is make sure the storage you are using is capable of SCSI-3 persistent reservations, share the storage to both of the cluster nodes, and cluster away.

Ever wanted a way to ping an entire subnet and don't have access to a tool to do it? Well this one liner allows that.

The For at the beginning says the variable $i is equal to one, the semicolon separates it from the amount of times the loop is run 1..254 and then the $i++ increments the loop by 1. (You can specify every 5th IP if you want)

The next section runs the windows builtin ping.exe command (make sure you include the .EXE extension) with the switch -n (which means number of times) and then in brackets you are specifying the IP to ping. The where statement at the end is looking for a match of "bytes=32" default output for a successful ping.

I was working with VMware Update Manager and was running a scan on the entire VMguest infrastructure. Well the Update Manager service hung. I am not going to say anything more about that. :)

So I went to the UM server and attempted to restart the service. It sat in stop pending for quite awhile so I decided to kill the process. hmmm how do I do that? I was going to use powershell but the only cmdlet's available are to get- and stop- and restart-. All in the same token as going through the GUI.

Tasklist.exe /SVC
This displays all of the services running and their PID

Taskkill.exe /PID <PID #> /T
This terminates the service and child processes.

This killed the Update Scan on the VIclient and allowed me to restart the service.