DR in a Box

Virtualization offers big advantages over the physical world in a key area of IT: disaster recovery (DR). Not having to exactly duplicate your mission-critical hardware setup in an offsite location can result in huge cost savings.

One of virtualization's pioneering vendors, PlateSpin (now owned by Novell), has released a DR product worthy of your attention. PlateSpin Forge is a hardware appliance that can protect your critical data and make it recoverable with a speed that belies its reasonable cost.

Protected Workloads
PlateSpin Forge manages server workloads -- here defined as a server's data, applications and OS -- for physical as well as virtual systems. The base version covers 10 systems; it can scale up to 25 per appliance. Once these systems are identified, PlateSpin Forge allows this protected workload to exist in a standby virtual environment. The goal of this protected workload is to be a portable object, because PlateSpin Forge offers failover and failback features that can permit the workload to be moved between environments with minimal effort and downtime.

The workloads are selected Windows systems that are kept up-to-date on the appliance. There are three configuration methods for this:

A file-based transfer mechanism

Use of Volume Shadow Copy Service

A block-level replication transfer (this is the preferred method)

How It Works
PlateSpin Forge is delivered with one management virtual machine (VM) that provides the Web management interface and controls the appliance. The appliance is based on ESX 3.5 from VMware Inc. All of the workload protection options will have a corresponding VM on the PlateSpin Forge appliance running in a Windows Preinstallation Environment, and interacting with the protected system on a schedule configured in the management interface.

Once the system is put into the protection schedule, an initial replication is started. After that point, the workload can be configured into the desired "protection tier." This tier determines how an organization's recovery point objectives (RPOs) are to be met. PlateSpin Forge can go as tight as hourly on a workload's replication schedule, making a one-hour RPO. The recovery time is fairly quick -- approximately 15 minutes -- making the recovery time objective (RTO) quite appealing for the number of protected workloads on the system.

Figure 1 shows five protected workloads with different protection tiers within the Web-based management interface.

[Click on image for larger view.]

Figure 1.Different PlateSpin Forge workloads are shown with their replication schedule to provide a quick look at their status.

Once the workloads are loaded into the protection tier, the management VM provides good information to make ongoing decisions about them. Specifically, most admins will have questions related to the corresponding network traffic. While PlateSpin Forge can't make the network magically work better, it can provide detailed information on what occurs during a replication, including how long it takes and the amount of data that makes up the incremental updates. The incremental updates occur on the protection tier schedule, and will vary widely by workload. Figure 2 shows the replication window report.

[Click on image for larger view.]

Figure 2.PlateSpin Forge's traffic report shows the network usage for each protected workload. This is critically important, as too much replication can swallow a network.

Server Failed: Now What?
When a server fails, PlateSpin Forge takes control and brokers the next steps, based on administrator input. It can be configured to send e-mails with actionable responses to a smartphone, e-mail address or the management Web page. Once the failover is initiated, PlateSpin Forge brings the VM assigned to that workload online.

On the networking front, PlateSpin Forge can configure the VM to have a new TCP/IP address during the managed failover. When PlateSpin Forge is located in a remote data center on a separate network, it manages the address change as part of the failover process, if required, for the destination network. For VMs that are built to re-establish all connections to databases and start required services, this can make for an entirely hands-off failover. The entire failover process takes about 15 minutes for most workloads, with slight variations for boot time of the guests. It's important to note that SysPrep is not used on the workload failover. Keep that in mind, as there may be components in the Windows environment (such as vendor licensing) that may not function correctly after a SysPrep task.

Native Failover and Failback
While many products can manage a failover, PlateSpin Forge has managed-failback functionality that can transfer the live workload back to the remedied original system. This is a key differentiator for an organization that may be considering VMware's Site Recovery Manager, which doesn't yet provide automated failback (VMware is expected to add it to a future release). PlateSpin Forge takes managed failback one step further with the option to restore the workload to a VM or physical hardware.

PlateSpin Forge allows the failover procedure to be tested in an isolated environment, without impacting the online network. Having a way to test the failover process by getting specific time requirements will help admins meet the defined RTO objectives with the actual systems protected.

New appliances often raise questions about supportability, but not in this case. PlateSpin Forge is built on the Dell PowerEdge 2950 III server for the PlateSpin Forge 510 and 525 models. For protected workloads, the PlateSpin Forge 310 and 325 models are built on the PowerEdge 1950 III. PlateSpin Forge is supported by Novell, with any equipment exchanges being handled by Dell.

Caveats
While PlateSpin Forge delivers native functionality that will fit many organizations, it does have some limitations:

It can't be used in configurations that may seem possible based on software and hardware inventory. Specifically, it can't host a VM that's a member of a cluster with a node outside the appliance.

It can't co-host a VM with another ESX server to cover a host failure like a Marathon everRun solution or VMware's upcoming fault-tolerance functionality.

While PlateSpin Forge uses ESX 3.5 as the underlying hypervisor, it can't be placed into a configuration to be managed by vCenter (formerly VirtualCenter). Instead, it includes a management VM for all appliance tasks. The ESX Web interface is available as a separate console, however, for basic tasks related to host storage management, networking and PlateSpin Forge performance.

Just the Facts
PlateSpin Forge has four offerings that offer protection for up to 10 workloads. The 500 series appliance is a capable system with dual 2.6GHz quad-core processors and 2.5TB of local SATA storage configured as RAID 5. The base models start with 16GB RAM, which can be bumped up to a maximum 32GB RAM.

The base prices include the management pieces, the ESX component and the management VM. PlateSpin Forge can also connect to an iSCSI or Fibre Channel SAN for connection to existing storage systems.

PlateSpin Forge is a strong all-in-one solution that fits into most environments with little configuration and high functionality with a right-sized cost. Small and midsize businesses can make a strong case for PlateSpin Forge; larger shops may find scaling issues when considering running all workloads in a DR situation for the core data center, but may see benefit in the remote or branch office with a technology footprint.

About the Author

Rick Vanover (Cisco Champion, Microsoft MVP, VMware vExpert) is based in Columbus, Ohio. Vanover's experience includes systems administration and IT management, with virtualization, cloud and storage technologies being the central theme of his career recently. Follow him on Twitter @RickVanover.