Ensuring Uptime: Clustering vs. Dedicated Fault-Tolerant Technology

Uptime. Downtime. Those were two words systems administrators lived and breathed by. Because of material failure, application problems or scheduled maintenance, there was always going to be a time when data was offline.

Businesses just had to deal with it. Today, however, the 24-hour economy and the Internet have changed the expectations users have of their systems. Now, downtime could spell a business' death.

Two current approaches,clustering and dedicated fault-tolerant technology,ensure continuous availability of data clustering and dedicated high-availability systems. Both are ways to a common goal, but have different economic models and different technologies. Understanding how each works is crucial for solution providers, who could be asked to recommend one approach over another.

Clustering

In simple terms, clustering is the connecting together of two or more computers in a way that they behave like a single computer. Clustering refers to a number of ways to group servers in order to distribute load and eliminate single points of failure within a business-critical system.

Clustering solutions are employed for parallel processing, load-balancing and, most commonly, fault tolerance. Proponents of clustering suggest that the approach can help an enterprise achieve close to 100 percent availability in some cases. One of the attributes of clustering is that, to the outside observer, the cluster appears to be a single system. One common use of clustering is to load-balance traffic on high-traffic Web sites.

Recently, building clusters around clustering software and low-end workgroup servers has gained popularity because it enables companies to leverage existing investments or build scalable fault-tolerant systems from relatively inexpensive components. IBM's Sysplex is one example of a clustering approach for a mainframe system.

Redundant Backup

The flip side of clustering is dedicated fault-tolerant technology, which typically takes the form of a system with two or more redundant computers, each performing tasks in unison. Should one of the computers fail, the other continues to work; an alarm alerts managers to replace the malfunctioning computer. When the replacement is installed, the system automatically mirrors a current, correct image of the computing environment, ensuring continuous operation with no lost data.

That approach, which evolved from the telecommunications environment, is more expensive up front, but requires less human intervention. As a result, it reduces operating costs over time.

So, when your customers ask for a fault-tolerant solution, which technology do you recommend? Either is a viable choice, but circumstances will dictate which is the best candidate for the job. The following two solution providers found the right customer matches, vendors and technologies. Here's what they did.

SiegeWorks' On Fault-Tolerance

Fault-tolerant systems, such as those offered by vendors such as NEC, Resilience and Stratus Technologies are at their most useful in situations where ease of operation and even hands-off operation are required. Jeff Bennett, CEO and president of Pleasanton, Calif.-based solution provider SiegeWorks, says that makes them very useful in engagements outside of major metropolitan areas, where technicians educated in the intricacies of managing clusters are difficult to find and even more difficult to afford.

"They're ideal for remote locations where it's hard to find dedicated staff and in situations where hiring a dedicated staff would break the bank," Bennett says.

SiegeWorks is based in the East Bay but does a lot of work in the whole of northern California, and its arsenal of products includes both clustering solutions and fault-tolerant systems such as Resilience's SPARC-based Ultra.

"In most cities out West, there is a lot of talent," he says, "but once you get out of the metropolitan areas, you find a lot of companies where the skillsets are not up to the level needed for a cluster. That puts them at the mercy of outside consultants."

Systems such as NEC's Express 5800/ft series are finding favor with customers including local and rural governments, which have the same record-keeping and public safety responsibilities as their more metropolitan counterparts but are forced to do the job with fewer financial resources.

These solutions are also uniquely useful for customers looking for "point solutions, like access control, firewalls or other specific mission-critical applications," Bennett says. They also help in situations where software carries a high price per server license; in a large server cluster, that could be a hefty expense.

Clustering, he says, brings with it a high level of service dollars, but those numbers can cause some opportunities to be lost. Dedicated fault-tolerant systems can be more expensive to install up front, but customers with smaller budgets can justify the up-front investment by looking at their return on investment over time. Clustering, on the other hand, is built on the idea that the more devices that can be added to a system, the better. That underpinning makes it more scalable, and thus more suitable, for organizations anticipating growth,or even contraction,of their computing requirements.

Acropolis Systems' Take

Applications particularly well-suited to clustering include databases, load-balancing and electronic commerce back-end systems, although John Pham, CEO of Milpitas, Calif.-based solution provider Acropolis Systems, says the number of applications his customers perceive as ideal for clustering is constantly growing.

"Everything is mission-critical nowadays," Pham says, "but no one is willing to pay for the hardware to get uptime, especially if they've already made the investment in servers."

Pham has found that clustering has special resonance with customers that have existing data centers but with a new sense of the importance of data protection.

Acropolis has had considerable success in the past year selling its customers' solutions based around Sun's SunCluster management system, which dominates the market, along with products from heavy-hitters such as Hewlett-Packard, IBM and Oracle, with a host of smaller companies also vying for the customers' attention.

Because it does involve a degree of complexity, support is a critical concern, which partially accounts for Pham's choice of Sun as a vendor. "One phone call, and I can get [Sun on site," he says. "I would not recommend a customer to do clustering without gold or platinum support from Sun."

A particularly lucrative market for Acropolis has come from existing customers using Oracle's database. A typical installation might start with two Sun E5500s. Once they're installed, if a system has problems, "they can fail over to each other," Pham says, ensuring continued uptime and data protection. The same facility allows hardware and software additions to the cluster without having to incur downtime.

"We can back up the system to one server, just in case, and make our upgrades," Pham says.

That also ensures that customers come back to Acropolis as they grow, says Pham, resulting in a returning revenue stream and a tighter relationship between the customer and the solution provider, if the cluster is managed correctly.

"We invested in a call center and procedures for helping avoid problems," Pham says. "We very rarely have to respond to problems on-site, but it's important to customers that we have that capability."