IBM Enhances System z Availability

Reliability and high availability are hallmarks of the System z, designed into the system architecture from the start. This is essential for classic transaction processing production workloads and one of the reasons so many of those workloads run on the System z. Now the growing interest in using the z as the core of a private cloud and other workloads adds another dimension to the system availability issue.

The Geographically Dispersed Parallel Sysplex (GDPS) has been IBM’s primary System z vehicle for achieving extremely high levels of System z availability. A recent IBM announcement expanded on the GDPS options primarily by adding remote asynchronous replication to greatly extend the distance between the paired systems. Further upcoming enhancements announced by IBM will add GDPS Active/Query configurations, which will provide the ability to selectively query data in either site in the replicated sysplex.

GDPS, essentially, is system clustering technology for the z. You set up two systems, one as a mirror of the other, and update the data synchronously or asynchronously. When the primary system fails, you bring up the other and resume working as before. How you define your RPO and RTO determines how smoothly you can pick up following a failure and with how much data lag or loss.

Until now about the best availability System z users can achieve was through GDPS/PPRC. It is based on a multisite Parallel Sysplex using synchronous disk replication in a metro-area Continuous Availability (CA), Disaster Recovery (DR) scenario. It comes in two flavors, active/standby and active/active. This is where you can hit your tightest RPO and RTO. Synchronous replication, however, entails distance constraints that make it inappropriate for many organizations. It’s also quite expensive.

Asynchronous replication is not bound by synchronous distance constraints. IBM offers GDPS/XRC and GDPS/GM, based upon asynchronous disk replication with unlimited distance. The current GDPS async replication products, however, require the failed site’s workload to be restarted at the recovery site, which typically will take 30-60 min. This will not satisfy organizations that have a RTO of seconds.

In its latest announcement IBM presents GDPS active/active continuous availability as the next generation of GDPS. This represents a shift from the failover model, where systems go down and can be brought online at the failover site in a few hours, to a near continuous availability model, where they system can be brought back online in an hour or less. IBM describes the latest enhancements as combining the best attributes of the existing suite of GDPS services and expands them to allow unlimited distances between your data center sites with RTO measured in minutes if not seconds. With its latest GDPS solutions, IBM promises to achieve near continuous availability, meaning it can meet an RTO of tens of seconds.

As organizations increasingly turn to their zEnterprise to handle more and different workloads, a new concept of mission critical is starting to emerge. Mission critical might now include real-time analytics or the z as the enterprise private cloud, and the continuous availability model will become attractive here too. Right now the latest GDPS enhancements focus solely on the System z. Later, IBM suggests it will expand that focus to include other platforms making up the hybrid zEnterprise.

Of course, most mainframe shops don’t use GDPS at all and manage quite well. Often they rely on tape backup and can withstand an RPO and RTO of hours, even a day or two. DancingDinosaur addressed this issue in February. Not every workload or organization needs fast RTO, but when you do there is the enhanced GDPS.