Ride the next big wave: server clusters

Mar 16, 1998

Network administrators might as well get ready now, because there's no sidestepping the next big server trend: clusters. From the workgroup server under the desk to the mammoth workhorses in the data center, clustering soon will be everywhere.

Microsoft Cluster Server software, code-named Wolfpack, is already part of Microsoft Windows NT Server 4.0 Enterprise Edition. Expect a more robust version in next year's release of NT Server 5.0 (see story, Page 32). That event will encourage many administrators to convert their network topologies from single servers to binary and beyond. Cluster technology will dominate the server side of the enterprise for the next few years.

The GCN Lab offers a short course in Clustering 101.

The server operating system makes up a small portion of clustering readiness. Applications also have to be cluster-aware, and the server hardware and network infrastructure will need more administrator attention.

In simple failover, the most basic form of clustering, two servers make a cluster. If one fails, the other takes over. But if the surviving server comes up short in processing power, memory or storage resources, it too can fail or slow users down.

Microsoft Corp. certifies clustering hardware and configurations for Cluster Server. Compaq Computer Corp. is certifying as many of its recent ProLiant servers as possible. Other makers such as Dell Computer Corp. are producing servers specifically for clustering--for example, the Dell PowerEdge 6100 Cluster.

Hardware certification is crucial. Microsoft will not support its Cluster Server on unapproved hardware or configurations.

Although clustering can be done in many ways, simple failover will gain the broadest use, because Microsoft Cluster Server will be so widely available in NT 5.0.

In a failover setup, the two clustered servers connect to each other via dedicated network interface cards. The NICs exchange is called a heartbeat signal. If an application, resource or server fails, the heartbeat stops, and an alert goes out for the other server to take over.

Recovery can take up to 30 seconds, depending on the setup and applications. For example, a network resource might be polled once per second through the heartbeat. If after two consecutive pollings the resource is judged dead, the server will restart or fail over to the other server. A restart attempt delays failover. If restart is attempted four times, downtime before failover would total eight seconds.

Data loss is supposed to be limited to nonshareable information on the client and possibly data from the process that was executing when the failure occurred. Users would notice nothing but a slight delay.

The failover cluster, though it has two servers, appears to the user as one virtual server. Each physical server knows its own TCP/IP address plus the address assigned to the virtual server.

Microsoft's two-node cluster will work with shared data files, networked printers and World Wide Web pages under its Internet Information Server product.

Other server applications fall into cluster-aware and nonaware categories. Some of Microsoft's most recent products such as Exchange Server 5.5 and SQL Server enterprise editions are now cluster-aware.

Cluster-compatible, as opposed to cluster-aware, applications come from other software companies, such as Lotus Development Corp. Lotus officials maintain that Lotus Domino Server 4.6 coexists with Microsoft Cluster Server, but Domino's Wolfpack support is limited.

Domino does its own proprietary clustering but doesn't support any other applications, just as linked, distributed e-mail servers don't. Domino permits up to six nodes; each can run a different operating system.

Novell Inc. soon will announce a clustering package code-named Orion. It might be released concurrently with or as part of NetWare 5, and it is expected to do 16-node clustering. Beta versions of cluster-aware Novell applications probably will arrive by early summer.

The next version of Windows NT may or may not expand beyond simple two-node failover. Eventually NT could support as many as eight or 16 nodes under distributed parallel clustering. Here, the servers no longer exchange a heartbeat. They communicate through their own system area network (SAN), which is separate from the LAN or WAN.

SAN communication becomes even more important than the number of processors, because the SAN relays so much information.

Under distributed parallel clustering, storage is split off from the server CPUs. Each server has redundant arrays of independent disks or other high-bandwidth storage subsystems. The SAN must handle not only requests from the clients and other normal server functions but also the data storage.

As clustering products gradually emerge, load balancing will become the industry's war cry.

Load balancing means directing routines to individual processors.

The cluster OS makes the decisions on where to send tasks to equalize the workload among different servers and processors. The task will become highly complex as four-way and perhaps eight-way servers appear on enterprise networks.

A distributed parallel cluster of eight servers with four processors apiece has 24 possible routes to compute something. The processors don't work in concert as in a massively parallel system. Instead, each works separately on a manageable task.

An alternative to this scenario is Oracle Corp.'s Parallel Server for NT, which does distributed cooperative clustering on up to four nodes.

Although some load-balancing occurs, the four nodes converge into a single, shared storage subsystem that must provide highly reliable storage. If it doesn't, the number of processors won't matter, because data will be unavailable.

In Wolfpack's two-node failover, both servers must be capable of handling the whole load, so load-balancing is not an important consideration. Adding a second, similar server to an overburdened 200-MHz Pentium Pro server would improve performance little if at all.

If one of the servers cannot handle the full load, it will fail when needed, leaving no server. Therefore, under Wolfpack, both servers must be robust and almost identical in configuration.

Compaq may get all of its ProLiant line certified, but more recently designed products will likely be better optimized for a clustered environment.

Compaq offers an excellent CD-ROM with a step-by-step guide on how to set up a cluster. Compaq provides the CD exclusively to GCN readers.

Call 800-392-9299. When prompted for the reseller code, enter 5555. You must give your name and address.