The ability to use Crossbow VNICs as endpoints for the cluster private interconnects. You can even send the cluster traffic over the public network and secure it with IPsec.

Support for exporting locally attached storage as iSCSI targets with COMSTAR iSCSI. You can obtain redundant “shared storage” without true shared storage by creating a mirrored zpool out of iSCSI-accessible local disks on two different nodes of the cluster.

Taken together, these features contribute to “hardware minimization,” allowing you to form a cluster with fewer physical hardware requirements.

This release runs on both SPARC and x86/x64 systems and includes the following agents:

Apache Webserver

Apache Tomcat

MySQL

GlassFish

NFS

DHCP

NFS

Kerberos

Samba

Solaris Containers (for ipkg Zones)

Open HA Cluster 2009.06 is distributed as IPS packages from the https://pkg.sun.com/opensolaris/ha-cluster repository. In order to obtain access, accept the license agreement at https://pkg.sun.com to obtain a certificate and key. Follow the instructions given at registration to configure your system's access to the ha-cluster publisher.

To install the complete cluster, including agents, install the “ha-cluster-full” package. To install a minimal cluster, without agents and other optional components, install the “ha-cluster-minimal” package instead. You can then install the individual agents and other optional components.

Open HA Cluster 2009.06 is free to use, with production level support offerings available for two-node clusters. This release runs on OpenSolaris 2009.06 only.

Monday Apr 27, 2009

Join us as we explore the latest trends of High Availability Cluster technologies, as well as key insights from HA Clusters community members, technologists, and users of High Availability and Business Continuity software. Learn how to increase the availability of your favorite applications from blogs to enterprise level infrastructure. If you are a student, you may want to consider the industrial-strength Open HA Cluster software for your thesis research. You will also have the unique opportunity to hear one of the featured guest speakers, Dr. David Cheriton, industry expert and professor at Stanford University.

Friday Mar 06, 2009

In addition to my day job as an engineer on the Sun Cluster team, I spent most of my nights and weekends last year writing a tutorial and reference book on OpenSolaris. OpenSolaris Bible, as it's titled, was released by Wiley last month and is available from amazon.com and all other major booksellers. At almost 1000 pages, my co-authors Dave, Jerry, and I were able to be fairly comprehensive, covering topics from the bash shell to the xVM Hypervisor, and most everything in between. You can examine the table of contents and index on the book website.

Of particular interest to readers of this blog will be Chapter 16, “Clustering OpenSolaris for High Availability.” (After working on Sun Cluster for more than 8 years, I couldn't write a book like this without a Chapter on HA clusters!) Coming at the end of Part IV, “OpenSolaris Reliability, Availability, and Serviceability”, this chapter is a 70 page tutorial in using Sun Cluster / Open HA Cluster. After the requisite introduction to HA Clustering, the chapter jumps in with instructions for configuring a cluster. Next, it goes through two detailed examples. The first shows how to make Apache highly available in failover mode using a ZFS failover file system. The second demonstrates how to configure Apache in scalable mode using the global file system. Following the two examples, the chapter covers the details of resources, resource types, and resource groups, shows how to use zones as logical nodes, and goes into more detail on network load balancing. After a section on writing your own agents using the SMF Proxy or the GDS, the chapter concludes with an introduction to Geographic Edition.

This chapter should be useful both as a tutorial for novices as well as a reference for more advanced users. I enjoyed writing it (and even learned a thing or two in the process), and hope you find it helpful. Please don't hesitate to give me your feedback!

Tuesday Feb 24, 2009

You have heard about "Practice what you preach", and here at Solaris Cluster Oasis we
often talk about how important high availability is for your critical applications. Beyond just the good sense of using our own products, there is no
substitute for actually using your own product day in and day out. It
gives us engineers a very important dose of reality, in that any
problems with the product have a direct impact on the our own daily
functioning. That begs the question: How is the Solaris Cluster group dealing with its own high availability needs?

In this blog entry we teamed up with our Solaris Community Labs team to provide our regular visitors to Oasis with a peek into how SC plays a role in running key pieces of our own internal infrastructure. While a lot of SUN internal infrastructure uses Solaris Cluster, for the purpose of this blog entry, we landed up choosing one of the internal clusters which is used directly for Solaris Cluster Engineering team for their own home directories (yes, that is right, home directories, where all of our stuff lives, is on Solaris Cluster), and developer source code trees.

See below for a block diagram of the cluster, continue after the diagram for more details about the configuration.

Here are some more specifications of the Cluster:

- Two T2000 servers

- Storage consists of four 6140's presenting RAID5 LUNs. We choose the 6140s to provide RAID, partly because they were there and also partly to leverage the disk cache on these boxes to improve performance

- Two Zpools configured as RAID 1+0, one for home directories and another for workspaces (workspace is engineer-speak for source code tree)

- Running S10U5 (5/08) and SC3.2U1 (2/08)

High Availability was a key requirement for this deployment as downtime for a home directory server with large number of users was simply not an option. For the developer source code too, downtime would mean that long running source code builds would have to be restarted, leading to costly loss of time, not to mention having lots of very annoyed developers roaming the corridors of your workplace is never a good thing

Note that it is not sufficient to merely move the NFS services from one node to other during the failover, one has to make sure that any client state (including file locks) are failed over. This ensures that the clients truly don't see any impact (apart from perhaps a momentary pause). Additionally, deploying different Zpools on different cluster nodes means that the compute power of both nodes is utilized when both are up, while we continue to provide services when one of them is down.

Not only did the users benefit from the high availability, but the
cluster administrators gained maintenance flexibility. Recently, the SAN
fabric connected to this cluster was migrated from 2 GBps to 4 GPps and
a firmware update (performed in single-user mode) was needed on the
fibre channel host bus adapters (FC-HBA's). The work was completed
without impacting services and the users never noticed. This was simply
achieved by moving one of the Zpools (along with the associated NFS
shares and HA IP addresses) from one node to another (with a simple
click on the GUI) and upgrading the FC-HBA firmware. Once the update was
complete, repeat the same with the next node and the work was done!

While the above sounds useful for sure, we think there is a subtler point here, that of "confidence in the product". Allow us to explain: While doing a HW upgrade on a live production system as described above is interesting and useful, what is really important is the ability of the system administrator to be able to do this without taking a planned outage. That is only possible if the administrator has full confidence that no matter what, my applications would keep running and my end users will not be impacted. That is the TRUE value of having a rock solid product.

Hope the readers found this example useful. I am happy to report that the cluster has been performing very well and we haven't (yet) have had episodes of angry engineers roaming our corridors. Touch wood!

During the course of writing this blog entry, i got curious about the origins of the phrase "Eating one's own dog food". Some googling led me to this page, apparently this phrase has its origins in TV advertising and came over into IT jargon via Microsoft, interesting....

Thursday Jan 29, 2009

I am sure you have seen the recent blog post and the announcement of Solaris Cluster 3.2 1/09. This release has a cool set of features, which includes providing high availability and disaster recovery in virtualized environment. It is also exciting to see distributed applications like Oracle RAC run in separate virtual clusters! Some of the features integrated into this release were developed in the open HA Cluster community. That is another first for us!

I am sure the engineers in the team will be writing blog journals in the coming weeks, detailing the features that have been developed. Stay tuned for more big things from this very energetic and enthusiastic team!

It is very interesting to see the product getting rave reviews from you, our customers. We value your feedback and take extra steps to make the product even better. We appreciate the acknowledgement! Here is one customer success story from EMBARQ that will catch your attention:"Solaris Cluster provides a superior method of high availability. Our
overall availability is 99.999%. No other solution has been able to
ensure us with such tremendous uptime" — Kevin McBride, Network
Systems Administrator III at EMBARQ

That is one big compliment! Thank you for your continued support and feedback!

A distributed team has made this release possible. Some of the features are large and needed coordinated effort across various boundaries (teams, organizations, continents)! A BIG thanks to the entire Cluster Team for their hard work, to get yet another quality product released. It is a great team!