New Approach to High Availability Computer Power System Design

The ideal computer power system for data centers these days places a tall order for electrical designers. Just stop and think about all of the requirements. The system must be available nearly 100% of the time, easily expandable, simple to maintain, fault-tolerant, and, most importantly, cost effective. Does such a system even exist? Here's a design that meets this criteria through the use of standard,

The ideal computer power system for data centers these days places a tall order for electrical designers. Just stop and think about all of the requirements. The system must be available nearly 100% of the time, easily expandable, simple to maintain, fault-tolerant, and, most importantly, cost effective. Does such a system even exist? Here's a design that meets this criteria through the use of standard, non-proprietary products in a unique system architecture.

Called a tri-isolated redundant design, this system provides three distinct distribution paths, along with associated equipment, to serve the load of two. Normally, all three paths are active, with each path supporting a third of the total computer load. However, if any distribution path or component in that path fails, the loads normally supplied by it are transferred seamlessly to the two surviving paths.

The benefits of this design are illustrated through a supporting one-line diagram (see Figure), which depicts one modular block of computer load rated at 750kW. However, the design presented allows for a total of four blocks of computer load, each rated at 750kW. This selection is based on an efficient rating of UPS input switchboards. You can readily adjust the number of load blocks and total computer load to be served to accommodate facility computer load requirements.

UPS switchboards. The computer power system design originates at the UPS input switchboard, which is served via a utility source and a generator source. If the utility source fails, loads supported by this switchboard will automatically transfer to the standby generator source. Upon restoration of the utility source, the loads will automatically retransfer to the utility source. The design incorporates three such switchboards designated IS-A, IS-B, and IS-C.

Normally all three input switchboards are energized with each supporting approximately one third of the total computer load. However, the input switchboards are rated such that if any one switchboard fails, the two surviving switchboards will have sufficient capacity to support all computer loads.

As noted on the diagram, the capacity of each input switchboard is 3,000A. Assuming four 750kW blocks of computer load are installed, the normal load is 1,600A — and the load imposed on the two surviving input switchboards, should any one fail or require maintenance, is 2,400A.

Each UPS input switchboard is equipped with eight 800A breakers to provide separate feeders to the rectifier inputs and the bypass inputs of four UPS systems. Separate feeders are provided to prevent a fault on the rectifier input from de-energizing the bypass input. As noted above, we based this selection on an efficient rating point that integrates well with other system components. Moreover, it applies standard non-proprietary products.

Let's focus on the attributes of one typical 750kW block of computer load as shown on the diagram. Additional 750kW blocks are identical.

UPS systems. Each input switchboard supplies power to an individual UPS system. Each UPS system includes a static switch, designated as “SS.” This switch is integral to the UPS system and provides a seamless transfer of critical loads from UPS output to utility or generator source, should the UPS system experience an internal failure. The one-line diagram shows a conventional double conversion UPS system; however, this design works equally well with rotary UPS and line interactive UPS.

As noted, the capacity of each UPS system is 400kW, while the normal load is 250kW. The load imposed on each of the two surviving UPS systems — should any one fail or require maintenance — is 375kW.

The three UPS systems are provided with load bus synchronization controls to keep them within acceptable voltage limits and phase tolerances, enabling downstream static switches to transfer between them without affecting computer operation. These controls also keep the three UPS systems within acceptable limits when any or all systems are operating on batteries, generators, or utility power.

We based the selection of UPS systems on an efficient rating point that integrates well with the capacities of other system components. Moreover, it meets our objective to apply only standard, non-proprietary products.

UPS batteries. Each UPS system is provided with an independent UPS battery. If the input power to any UPS system fails, the UPS batteries will continue to supply power to the computer load for the duration of their protection period.

Again referring to the Figure, the protection period selected is 15 minutes at 400kW computer load (full UPS load). However, the normal computer load is 250kW — and the load imposed on the two surviving UPS systems under failure/maintenance conditions is 375kW. Because the computer loads are less than the battery design load, the protection period extends to 40 minutes under normal conditions and 17 minutes under failure/maintenance conditions.

The battery technology selected is vented lead calcium type, with a pasted plate. This is a robust and highly reliable battery technology and should provide a service life of 20 years. The batteries are applied at an efficient rating point allowing application of a four-cell jar. Again, the batteries are standard, non-proprietary products.

UPS output switchboards. Conditioned UPS power is distributed from each UPS output switchboard to a series of static transfer switches. Each switchboard is rated for the full load output power of its respective UPS system. Also, each switchboard contains an 800A main breaker and four 400A, 100% rated output breakers to distribute power to downstream static switches. Normally, all three of these switchboards are energized, each supporting one third of the load of the computer block. However, the switchboards are rated such that if any one switchboard fails, the two surviving switchboards will have sufficient capacity to support all computer loads.

Referring to the capacity analysis data in the Figure, the capacity of each UPS output switchboard is 800A (640A continuous); while the normal load is 320A, and the load under failure/maintenance conditions is 480A. Again, this capacity provides an efficient rating point that integrates well with other system components and allows use of standard, non-proprietary products.

Static transfer switches. Loads are transferred between UPS systems by a series of six static transfer switches, designated as SS-AB, SS-AC, SS-BA, SS-BC, SS-CA, and SS-CB. Each of these is rated at 480V, 400A continuous. Essentially, these devices are electronic transfer switches. Two input sources are provided; one is the preferred source while the other is the alternate source. The load is normally supplied by the preferred source. If this source unexpectedly fails, the load is automatically transferred to the alternate source.

These transfers are open transition (break-before-make), but are very fast. The transfers occur in less than one-quarter cycle (4 milliseconds) so that connected computer loads, which can sustain this brief outage, remain in operation.

The input sources to the static switches are distributed between the three UPS systems in a specific pattern to achieve the desired results. The pattern is noted on the diagram through the use of an alphanumeric designation for each output from a UPS output switchboard and a corresponding designation on the static transfer switch input connected to this output. Each static transfer switch has a different set of UPS inputs as follows: C1-B1, A1-B2, B3-C2, A2-C3, C4-A3, and B4-A4. The first input is the preferred feeder while the second is the alternate feeder. This is denoted on the drawing by the position of the automatic transfer switch.

The objective of this connection pattern is to provide an operating mode where if any one UPS source fails, its loads are transferred to the two surviving UPS sources, such that half of the load is transferred to each surviving source. If you carefully study the selected pattern, you will see that it achieves the desired result.

Normally, all six static transfer switches are in operation and connected to their preferred source. Each is supporting approximately one sixth of the load of the computer block. However, if any static transfer switch fails, the surviving static transfer switches will have sufficient capacity to support all computer loads.

The capacity of each static transfer switch is 400A, while the normal load is 160A. The load under failure/maintenance conditions is 320A. Once again, the selected capacity provides an efficient rating point that integrates well with other system components and allows use of standard, non-proprietary products.

The static transfer switches are denoted on the one-line as standard transfer switches to show their operation in this system. However, they include a number of additional attributes, including integral bypass capability to allow maintenance without affecting connected loads, manual switching capability to facilitate maintenance of upstream equipment, and recent non-proprietary advances in static transfer switch control algorithms that resolve transformer saturation issues when static switches are applied upstream of transformers.

Computer power distribution units. This design also includes a series of power distribution units (PDUs) that are designed to receive 480V output from a static switch, reduce the voltage to 120/208V, and distribute power to computer loads. The PDUs are designated as PDU-AB, PDU-AC, PDU-BA, PDU-BC, PDU-CA, and PDU-CB.

A total of six PDUs, each rated at 300kVA, are provided for each 750kW block of computer load. An individual PDU is connected to the output of each static transfer switch. The six PDUs are arranged in pairs to serve three groups of dual-cord computer equipment, identified as Group 1, Group 2, and Group 3.

The pairing of the PDUs indicates that two PDUs are normally connected to each of the three UPS sources. In addition, the pairs are arranged such that the normal source to each PDU in a given pair is supplied from a different UPS source.

The objective of this pairing pattern is to provide an operating mode where if any PDU fails, its loads are transferred to the surviving PDU of the pair. Further, if any static transfer switch fails, its loads are transferred to the surviving PDU of the pair served by another static switch.

Normally all six PDUs are energized, each supporting approximately half the load of the respective load group. However, if any PDU or static switch fails, the surviving PDU will have sufficient capacity to support all the load of the respective load group.

The capacity of each PDU is 300kVA, while the normal load is 135kVA and the load under failure/maintenance conditions is 270kVA. Yet again, the selected capacity provides an efficient rating point that integrates well with other system components and allows use of standard non-proprietary products.

Dual-cord computer load groups. This design is configured to serve three groups of dual-cord computer load for each 750kW block of total computer load. The computer manufacturer configures the dual-cord computer equipment so that it operates when either one or both cords are energized. Typically, both cords are active and supply a portion of the dual-cord computer's load. If either cord fails, the computer internally transfers its entire load to the remaining cord.

This power distribution design integrates nicely with this attribute of dual-cord computer equipment by maintaining redundant paths of power flow to each computer's dual input terminals. This is achieved by connecting each cord to a different PDU supplied by a separate static transfer switch, UPS output switchboard, UPS system, UPS battery, and UPS input switchboard. Moreover, the design incorporates high-speed, automatic transfer capabilities via the static switches such that failure/maintenance of any component upstream of the static switches will not require the dual-cord computer to internally transfer.

Overall reliability analysis. The tri-isolated redundant design promises to exhibit very high levels of reliability due to its inherent redundancy, fault tolerance, simplicity, and maintainability. The inherent redundancy is evidenced by redundant paths of power flow to three groups of dual-cord computer equipment. Further, integration of static switches into the design provides an additional measure of protection over conventional designs, such that you do not need dual-cord computers to switch internally for any failure/maintenance upstream of the static switch. Moreover, the static switches provide enhanced protection for any residual single cord computer equipment.

Another aspect of reliability is fault tolerance. The design achieves high levels of fault tolerance through redundant paths of power flow throughout, coupled with multiple high-speed switching features to transfer between the redundant paths.

Fault isolation further enhances fault tolerance, through the application of relatively small power system components. So, failure/maintenance of any component affects only a small portion of computer load. For example, the failure of any UPS system requires transfer of only two static switches from their preferred source to their alternate source. The selection and setting of overcurrent protective devices to achieve selective coordination also enhances fault isolation.

As studies have shown, a majority of data center outages result from operator error. The tri-isolated redundant design should reduce this risk, through the simplicity of its design. It exhibits less complexity than typical designs, which should increase operator understanding and reduce operator errors. Moreover, the modular system architecture, with its identical electrical infrastructure for each 750kW block of computer load, should further enhance operator understanding.

One final aspect of reliability is ease of maintenance. This design accommodates concurrent maintenance so that any component can be taken out of service for preventive maintenance or repair without computer downtime. This capability results from the redundant components and distribution paths incorporated into the design, coupled with the multiple high speed switching features.

In summary, the tri-isolated redundant design conforms to the requirements for a Tier IV electrical infrastructure:

It is fault tolerant and can sustain at least one worst-case unplanned failure or event without computer downtime; and

It accommodates concurrent maintenance of all system components.

Yester is chairman and design principal at Swanson Rink Consulting Engineers in Denver.

Sidebar: But How Much Does It Cost?

Compared to a typical Tier III design, a detailed cost analysis of the tri-isolated redundant design reveals it to be approximately 10% more expensive. For reference, a typical Tier IV design is approximately 40% more expensive than a typical Tier III design.

The incremental cost of load growth for the tri-isolated redundant design is nearly proportionate to load, while the typical Tier III and Tier IV designs exhibit an uneven cost of load growth. This results from the modular expansion capability of the tri-isolated redundant design (as load requirements increase, equipment is added in 750kW blocks). Moreover, the tri-isolated redundant design delivers a Tier IV infrastructure for a modest premium over typical Tier III designs and provides substantial savings over typical Tier IV designs.

These cost savings result from application of standard products at an efficient rating point and incorporating them into an integrated computer power distribution system. In addition, this design should yield additional cost benefits by substantially reducing delivery time, start-up/commissioning time, and overall project construction time.