In theory, what I wanted to do was simply schedule the VMs to be cloned the second host as a backup. If something happen to the first server, I could simply boot the VMs on the second machine and resume as if the first never went down.

In practice, this seems a lot less practical having done quite a few searches here on SF.

My main concern is the integrity and consistency of the SQL database... This backup strategy does not seem to be recommended for SQL servers due to unwritten data residing in memory. I suppose I could shutdown the server, clone it, then reboot, but in my perfect world, I'd like to duplicate these VMs at least nightly while still live.

What would be the best backup strategy for replicating these particular types of servers to a second ESXi host nightly while they are still live? Consider separate options for a budget of $1,000 and a budget of $10,000.

2 Answers
2

Although it won't even fit into a budget of $10,000, the ultimate option is to have two SAN's, and have the data replicated live between the two SANs, and then use VMWare SRM to boot the VM's on the other side in the event of a failure.

For a budget of $10,000 you should be able to get a single SAN array, and then use VMWare's High Availability function, which means that in the event of a failure of a host, all its VMs are immediately booted on other hosts. This makes the SAN a single point of failure, and you need to make sure it's fast enough to not become a bottleneck that impacts your daily work.

For a budget of $1,000 I would suggest a "cheap" NAS (such as a QNap 4xx series) and expose shared storage via iSCSI. They only expose 1GbE interfaces, which would be fine running things like a domain controller, but not much else (I've tried this, we have a 6Tb qnap here and it's just not up to the job of heavy iSCSI load).

Personally what I would suggest, if you can afford the downtime, is to have a 2nd SQL server installed on Host B and do transaction log shipping to it. You may not even have to purchase any additional hardware for this, and check with your microsoft representative, but you might not even need to license it. So keep them both active, and then re-point your applications to the 2nd SQL server when the host goes offline.

Also, I strongly advise against cloning your domain controller as there are issues with going backwards after a restore (or a snapshot). I would suggest again having two DC's, one on each host, and let their own replication (DFS) deal with it.

For your accounting and RDS servers, your cloning solution should work fine. I don't know what you're running on your RDS, but we decided we can afford to lose up to 24 hours of data without serious repercussions, so if you were to just clone it at night time you might be OK with it.

Look into VMware vSphere 5.0 with VSA (Virtual Storage Appliance). This will allow you to run a cluster on the two machines and will automatically replicate the VMs between the two machines in real time.

Data within SQL Server isn't considered accepted until it is written to the transaction log. Once that has been written to disk the client application will get the notice that the write is completed. Even if the pages that have been changed are still held in memory and not written to disk the writes to the log would have been completed. When the database comes online on the new host the transactions which are marked as completed within the log will be read from the log and applied before users are able to log into the system.

In this setup you should actually be running two domain controllers with one of each server (the rules can help you ensure this happens) so that when the host goes down one domain controller is still online so everything else keeps running until the other guest OSs come back online.