High-Availability System Architecture

Basic blueprints for file servers, Web servers, and DNS servers

You'll find a glut of articles that discuss high-availability concepts and strategies, as well as a plethora of articles that cover the engineering details of high-availability solutions' components. You're probably ready for an article that shows you how to put those components to use in your IT environment. Perhaps you've promised a high service level agreement (SLA) to your customers, and now you need to know how you're going to keep that promise. If you need to configure a high-availability Windows 2000 file server, Web server, or DNS server, you'll find this article's collection of basic blueprints extremely valuable.

High-Availability Management Phases

Application availability is inversely proportional to the total application downtime in a given time period (typically a month), and the total downtime is simply the sum of the duration of each outage. To increase a system's availability, you need to decrease the duration of outages, decrease the frequency of outages, or both. Before I discuss useful technologies, you need to understand the phases of a postoutage restoration.

In the event of a serious outage, you would need to build a new server from scratch and restore all the data and services in the time available to you. Suppose you've promised an SLA of 99.5 percent, and you're counting on only one outage per month. (For information about calculating availability percentages, see the sidebar "Measuring High Availability.") Within 3 hours and 43 minutes of the start of an outage, you would need to work through the following five restoration phases:

Diagnostic phase—Diagnose the problem and determine an appropriate course of action.

Base Provisioning phase—Configure the system hardware and install a base OS.

Restoration phase—Restore the entire system from media, including the system files and user data.

Verification phase—Verify the functionality of the entire system and the integrity of user data.

Regardless of your SLA, you need to know how long the phases take. Each phase can introduce unexpected and unwelcome delays. For example, an unconstrained diagnostic phase can take up the lion's share of your available time. To limit how much time your support engineers spend diagnosing the problem, set up a decision tree in which the engineers proceed to the procurement phase if they don't find what's wrong within 15 minutes. The procurement phase can also be time-consuming if you keep the backup media offsite and have to wait for it to be delivered. I once experienced a situation in which the truck delivering an offsite backup tape crashed en route to our data center. You might think 3 hours and 43 minutes is a short period of time in which to restore service, but in reality, you might have only about 2 hours to complete the actual restoration phase.

Blueprint for High-Availability File Servers

A file server doesn't require much CPU capacity or memory. To support 500 users and 200GB of data, you might use one small server, such as a Compaq ProLiant DL380 with two Pentium III processors and 512MB of RAM. With minimal accessories, such a setup costs about $9700 retail. If you use a DLT drive that provides an average transfer rate of 5MBps, physically restoring 200GB of data will take 666 minutes, or 11 hours and 6 minutes. Add an hour for the diagnostic, procurement, base provisioning, and verification phases, and you're looking at 726 minutes to recover from an outage. If you assume one outage per 31-day month (44,640 minutes), then $9700 buys you a file server with an SLA of 98.37 percent.

To increase the availability of file servers, you can use standard strategies: Reduce the time required to restore the file share and data during an outage and reduce the frequency of outages. Many technologies address each of these strategies for file servers. As a starting point, let's look at basic implementations that use the following techniques: data partitioning, snapshot backup-and-restore technologies, and fault-tolerant systems.

Data partitioning. In the configuration that Figure 1 shows, FileServer2 contains product data, FileServer3 contains images, and so on. To make this partitioning transparent to the user, you can implement a technology such as Microsoft Dfs, which lets you create a virtual file system from the physical nodes across the network. A user who connects to \fileserver1\share would see a directory structure that appears to show all data as if it were residing on FileServer1, even though some of the data physically resides on FileServer2.

Table 1 shows the availability you can achieve through data partitioning. (This table uses a typical SLA formula and assumes that you need to restore only one server during the outage.) The cost per server goes down as you accumulate servers and as the number and size of the servers' disks decrease. Obviously, the partitioning option is costly in terms of server hardware, so you need to decide whether you can live with an average of 12 hours 6 minutes (726 minutes) unscheduled downtime per month. Perhaps spending an extra $23,800 ($33,500 minus $9700) to reduce that time to 3 hours 13 minutes (193 minutes) makes sense for you. Data partitioning is particularly cost-prohibitive if you have huge quantities of data and need dozens of servers or more.

Snapshot backup and restore. An alternative to data partitioning is to implement faster technology. Faster tape drives won't necessarily provide a quantum leap in performance, so you'll need to use snapshot backup-and-restore technology, which is typically available in conjunction with Independent Hardware Vendors (IHVs—e.g., EMC, Compaq) of enterprise storage systems. Upcoming software snapshot products might change this equation, but for now, you need to address the enterprise storage vendors.

Figure 2 shows a Storage Area Network (SAN) environment.

You connect the SAN hardware to the server through a fibre channel connection (preferably redundant) and access the file system as if it were local. You can use a snapshot utility on the SAN to perform a quick backup (typically measured in seconds), then restore the data from a backup disk almost as quickly. As long as you create snapshots relatively frequently, you can restore data within the confines of even the most stringent SLA. Snapshot functionality might even be irrelevant—at my company, for the second half of 2001, our EMC SAN experienced 100 percent uptime, our Brocade switches boasted 99.9999861 percent availability, and we never experienced disk problems that required us to restore from snapshots. If you're concerned that the SAN might fail, you could implement redundant SANs with failover technology.

Unfortunately, unless a SAN replaces hundreds of small file servers, many hardware snapshot products and SAN technologies are prohibitively expensive. Even a small 400GB EMC and Brocade SAN infrastructure can cost $300,000. When you compare that price with that of two servers, each with 200GB of local storage—$19,400—you begin to understand the cost of very high availability. Cost-effective SAN and Network Attached Storage (NAS) vendors exist, so be sure to shop around carefully before you pass any final judgments about SLA costs.

Fault-tolerant systems. The two previous high-availability strategies focus on reducing the time necessary to restore the server and data. The third strategy involves implementing redundant systems that continue serving the client indefinitely if one system fails. You can make many components redundant—servers, disks, NICs, UPSs, switches, and so on. Some of these components are easy to add and relatively inexpensive. For example, if you add redundant NICs, power supplies, and disk controllers to the aforementioned ProLiant DL380 system, the cost rises from $9700 to about $11,600. However, ask yourself whether you need to spend that money. At my company, we're experiencing less than 0.025 percent failure rate on those components. (Probably the most crucial—and by far the cheapest—item you need is a UPS. If you don't have a UPS, put this article down and deploy one before reading any further.)

Let's look at three technologies for implementing redundant data and redundant servers: Dfs, RAID, and server clusters. The system in Figure 1 distributes user data across several physical locations and uses Dfs to present a simple, logical view of the data. If you have an exact replica of any of the file directories—for example, the \products directory—you can mount both the original and the replica at one point in the Dfs namespace. When users traverse a directory tree in Windows Explorer to reach the \products directory, they might be viewing data residing at the original or the replica. Dfs doesn't require that data in the various shares mounted at one point be identical, but you can configure Dfs so that it replicates the data on a schedule. If the server holding the replica crashes, users can still open files in the original location—a somewhat fault-tolerant scenario. If you configured multiple replicas, you would have even more redundancy. (If you want to have redundancy for the top-level share as well as the replicas of the data, you need to create a fault-tolerant Dfs root. For more information, see the Dfs documentation.) Any data not written to disk on the crashed server obviously would be lost, as would any nonreplicated data. The Dfs replication process isn't well suited for highly dynamic data, so you need to evaluate this technology carefully to determine whether it's appropriate for your SLA strategy.

You can set up Dfs to replicate data between servers. RAID addresses distribution and replication of data between one server's disks. On paper, computer disks boast extremely high reliability—for example, Seagate Technology claims that its Cheetah 36GB Ultra 160 SCSI disk provides 1,200,000 hours mean time between failures (MTBF), which means you can expect a failure approximately every 137 years. However, this MTBF value is deceiving. Hard disk failures are common and have many external causes. At my company, hard disks are our top-ranking hardware service item; we repair or replace an average of 66 disks per year in one data center that holds approximately 8000 disks of various ages. Executive Software's study "Survey.com Hard Drive Issues Survey" (http://www.execsoft.com/diskalert/reviews/hard-drive-survey.asp) uncovered similarly frightening statistics: 62 percent of data-center IT administrators rank disk failures as their top disk problem and estimate the average life of a SCSI disk at 3 or 4 years. The trick is to implement fault-tolerant RAID technology so that disk failures don't result in downtime.

RAID 1 technology mirrors the disks in realtime: If one disk crashes or becomes corrupted, the other disk continues to operate as usual and the system sees no performance degradation—although it's now operating without redundancy. The system that Figure 3 shows uses RAID 1 mirroring for the OS and the swap files. If any disk fails, you can remove it and replace it with a healthy disk, without turning off the computer. The RAID 1 SCSI controller creates a copy of the OS or swap file on the new disk, then reestablishes fault tolerance. No downtime occurs as a result of a disk failure, albeit at the expense of doubling the number of drives in the system. (For further information about RAID technology, go to the Advanced Computer & Network Web site at http://www.acnc.com/index.html.)

RAID 5 technology introduces additional fault tolerance by allocating portions of each disk in the array to parity data. This setup enables realtime reconstruction of corrupted data if one disk in the array fails. The parity data reduces the amount of usable space in the array by the equivalent of one disk out of the entire array. As with RAID 1, you can remove and replace the failed disk without turning off the computer, and you experience no downtime—but you do experience some performance degradation during an outage. In a RAID 5 array, only one disk in the entire array isn't usable from the client's perspective, so RAID 5 is much less costly than RAID 1. The configuration in Figure 3 uses RAID 5 for the file data. In the event of one disk's failure, the client probably wouldn't notice degradation in the disk array's performance. In our example, if you used four 72GB disks, you would have three 72GB disks' worth of usable storage, or 216GB for user data.

The configuration in Figure 3 is typical of data-center systems. My company used this template last year for most systems, and those systems experienced no downtime as a result of physical disk failures, despite the 66 disks that we needed to replace.

Table 2 summarizes the cost of this redundancy for a ProLiant DL380 system that has 200GB of user data and 72GB disks for the data. The cost of the RAID systems is particularly high because the ProLiant DL380 can't house 8 to 10 drives without an external chassis, which adds considerably to the price. Drive redundancy doesn't protect against data corruption that results from software problems, and you might still need to restore from tape for a variety of reasons. Data redundancy does, however, protect you from the need to restore from tape because of a disk failure. You need to assess your SLA to determine whether you can justify the cost.

Depending on your environment and hardware, your next weakest link might either be a network device or the servers. To create a redundant server environment suitable for a high-availability file server, you can implement a simple server cluster in Win2K Advanced Server.

Figure 4 shows a cluster that includes RAID technology for the disks.

You can configure a server cluster in many ways, but the basic concept is that if one server fails, another server takes over the failed server's functions. In the case of a file server, if a failover occurs from one system to another, users can continue working on a document stored on a shared disk array, possibly noticing a short delay while their applications reconnect to the cluster. Meanwhile, you can then take the failed server offline and repair it without affecting the users' operations or your SLA. When you finish repairing the server, you can rejoin it to the cluster and regain server redundancy. (Some applications aren't cluster aware, so be sure to check the cluster documentation carefully before you deploy a cluster solution.)

Table 3 summarizes the hardware cost of server redundancy for two similarly configured ProLiant DL380 servers with 200GB of file data in a shared external drive chassis. These numbers are approximations. The recommended Compaq solution replaces the shared SCSI channel with a fibre channel configuration, but I kept the SCSI channel to keep prices down. Also, cluster support can involve additional software and operational costs—for example, whereas you can use Win2K Server to install one file server, a cluster requires Win2K AS.

Server redundancy won't reduce the time necessary to restore data (as the partitioning options do) and won't create redundant copies of the data (as the RAID options do). This option only increases the availability of the server that publishes the disk data to the user. If your weakest link isn't the server, you might not need server clustering. For example, in our data center, less than 1 percent of our clients felt that the cost of clustering file servers merited the additional reliability.

Blueprint for High-Availability Web Servers

In the media, you can find many statistics about the availability of Web servers. Every time a major corporation or government entity has a Web site problem, the news makes headlines. An interesting source for availability numbers is Keynote Systems, which publishes the Keynote Government 40 Index and the Keynote Business 40 Index. The October 29, 2001, index showed the Federal Bureau of Investigation (FBI), Library of Congress, and Supreme Court Web sites ran at 99.24 percent, 99.96 percent, and 99.62 percent availability, respectively. Similarly, during the Christmas holiday season, the average availability of the top 10 shopping Web sites (e.g., Nordstrom, Neiman Marcus, Saks Fifth Avenue) was 98.5 percent. How do you achieve such levels of availability for a Web server?

Your high-availability options for a Microsoft IIS Web server aren't terribly different from those for a file server. You can configure the system to reduce the time necessary to restore service and data after an outage, and you can reduce the frequency of outages. In addition to the techniques you use for file servers, two features of Win2K AS and IIS are available: Virtual Directories for data partitioning and Network Load Balancing (NLB) for mirrored servers.

Data partitioning in a Web server environment is similar to building a partitioned file system. The primary difference is that you use IIS's Virtual Directories feature instead of Dfs to provide the unified namespace. For example, suppose you have multiple file servers (i.e., FileServer1 FileServer2, and so on), one of which will be the Web server (i.e., FileServer1). You configure the file servers just as you would in a file-server environment, then configure FileServer1 to be your Web server. Next, you use the Internet Services Manager (ISM) tool to publish a series of virtual directories off the root of the Web server. To do so, in ISM, right-click the root of the Web site on \fileserver1 and select New, Virtual Directory. Name the directory (e.g., \products), and identify the directory path for that data as \fileserver2\products. When users access your Web site's root, they can access the Products page as if it were on the root Web server—but the data actually resides on the small back-end FileServer2. The back-end server is small enough for you to restore in less than the allotted time, but the farm appears as one Web site to users.

In the case of a simple file server, you can use server redundancy (which the cluster example in Figure 4 shows) to reduce the frequency of perceived outages. In the case of a Web server, you have an additional option that's roughly equivalent to using Dfs replicas for a fault-tolerant file server. Because you're primarily reading individual document pages in a stateless action, whether subsequent reads take place from one server or another server is irrelevant. Therefore, you can provide a reasonably seamless user experience while you swap servers indiscriminately in the background. You can use Win2K AS's NLB to manage those transitions.

In this scenario, the best strategy is to simply build redundant servers, all of which contain the entire 200GB of data. To ensure that the data is identical on each server, you might use a service such as Dfs and the Win2K File Replication Service (FRS) to duplicate the information across the servers or use a third-party product to perform this type of replication. In a pure Microsoft environment, you might use the Site Server 3.0 Content Deployment Service or the Application Center 2000 Synchronization Service. Then, you install and configure the NLB service on all the servers so that they share one virtual IP (VIP) address.

When a user connects to the Web server cluster, the NLB service determines which server responds to the user. This determination depends on the NLB configuration. For example, in the case of the http://www.usi.net URL in Figure 5, the /www1.usi.net server might respond. If one of the servers fails, the NLB service simply fills the next user request from one of the remaining servers. The more servers you create, the less likely one server failure will affect your SLA. If you have two servers that handle 50 percent of the user traffic, one server's failure increases the other server's load by 100 percent. If you have five servers, one server's failure increases the load on the other four servers from 20 to 25 percent. As long as you have sufficient redundancy to maintain acceptable user performance while the failed server is offline, you'll meet your SLA by reducing or eliminating the number of outages that the user perceives.

Comparing the cost of using redundancy to enable a high-availability Web server with the cost of data partitioning a file server is interesting. In the scenario in Figure 1, the file servers collectively hold 200GB of available disk space for user data, whereas the scenario in Figure 5 requires that each Web server contain the full 200GB of disk space. Furthermore, in Figure 1, the file servers can be fairly small, regardless of whether you use one or five servers. A Web server, however, is more taxing on CPU capacity and memory; if you have only one Web server, you need a much more powerful machine, such as a ProLiant DL580 with four processors and 2GB of RAM. If you have multiple Web servers, you might be able to get away with the ProLiant DL380 in the example that Figure 1 shows.

Table 4 shows the relative costs. The price of a large server is greater than the price of multiple small servers. One reason for the higher cost is that the ProLiant DL580 needs an external drive chassis. Multiple load-balanced and redundant servers, with a theoretical SLA of 100 percent, are sometimes cheaper than one server that provides no redundancy.

Large public Web sites typically use a combination of technologies to achieve high availability. Figure 6, page 30, shows one method of combining strategies into an environment that boasts numerous high-availability components.

The scenario uses redundant load-balanced front-end Web servers, so the user will always be able to connect to the site. Server clustering ensures that a server is always available to handle file-system requests. Finally, to guarantee the availability of data, the system uses a redundant fibre channel fabric for access to an enterprise SAN. Only the number of servers in the front-end and back-end clusters—and any communications components between the client and servers—limit this architecture's availability. For more information about this kind of complex architecture, see the Microsoft article "Web Server Load Balancing and Redundancy" (http://www.microsoft.com/technet/treeview/default.asp?url=/technet/itsolutions/ecommerce/deploy/rollout/duwwsr.asp).

Blueprint for High-Availability DNS Servers

You can build a high-availability DNS system in much the same way that you build a highly available file server, except that the quantity of data is typically much smaller. Your primary concern is typically not the time necessary to restore the data but rather the availability of the DNS server. Therefore, you probably don't need to worry about a solution that decreases a DNS server's restore time. Your architecture must ensure that a client requesting name resolution can always find a DNS server that contains your zone data. The most complex high-availability DNS solution you need is simply two or three servers that have complete copies of all the host records you want to publish.

When an Internet client needs to find the IP address of a server in your domain, it issues a DNS query, which starts a series of events that culminate in a DNS server sending a query on the client's behalf to your DNS server. For example, InterNIC's records for my company show that we're running four DNS servers. If a client enters the http://www.usi.net URL in a browser, a DNS query eventually arrives at one of our four DNS servers, which replies with the address of our Web server. If you register multiple DNS server addresses with InterNIC, DNS clients can address queries to any of your DNS servers, and if one of your DNS servers is unavailable, the clients can query the other servers. The result is that the client perceives continuous service, even if one of the DNS servers is down. For comparison purposes, at press time, Cisco Systems has registered two DNS servers, IBM has registered four, and Microsoft has registered six.

To create redundant DNS servers, you install the DNS service on two or more servers. On one of those servers, you use DNS Manager to add the domain's host information. On each of the other DNS servers, you use DNS Manager to specify that the server is a secondary DNS server for the domain and that it should copy the host data from the primary DNS server. DNS takes care of the initial data replication from the primary server to the secondary servers, as well as any subsequent replication of updates if the data on the primary server changes. In a Win2K environment, you can specify that the DNS data reside in Active Directory (AD), in which case AD replication takes care of the DNS transfers and you don't need to specify primary and secondary DNS servers. In addition to increasing the number of secondary DNS servers for fault tolerance, you can create intermediary DNS servers that simply cache responses from your DNS servers without holding a copy of the DNS database. These caching servers reduce the load on your primary and secondary DNS servers by reducing the number of queries that ultimately reach those servers. Your DNS records might be cached in any number of other DNS servers on the Internet, increasing the resolving capacity of your system at no cost. (Some people refer to this phenomenon as a scale away strategy.)

Assemble the Building Blocks

You need to determine how much availability you truly need, as well as which components you can combine to produce your chosen level of availability. I focused this discussion on blueprints for building simple high-availability systems on Win2K systems. Although more complex applications—such as Microsoft Exchange 2000 Server and Microsoft SQL Server 2000—require more sophisticated configurations, you can still use many of the same building blocks I have provided in this article. (For detailed information about these building blocks, see "Related Articles in Previous Issues.")

Related Articles in Previous Issues

You can obtain the following articles from Windows & .NET Magazine's Web site at http://www.winnetmag.com.

Discuss this Article 1

Murat Yildirimoglu (not verified)

on Aug 13, 2002

Page 24 of the print article states that RAID 5 technology introduces additional fault tolerance by allocating portions of each disk in the array to parity data.
No, RAID 5 does not provides additional fault tolerance over mirroring. It is just another way of providing fault tolerance in which we have a more efficient fault tolerance (because mirroring means 50 % efficiency where as teh efficiency of RAID 5 exceeds 66%). It is efficient but it does not introduce any more fault tolerance.