A hot spare or warm spare or hot standby is used as a failover mechanism to provide reliability in system configurations. The hot spare is active and connected as part of a working system. When a key component fails, the hot spare is switched into operation. More generally, a hot standby can be used to refer to any device or system that is held in readiness to overcome an otherwise significant start-up delay.

In designing a reliable system, it is recognized that there will be failures. At the extreme, a complete system can be duplicated and kept up to date—so in the event of the primary system failing, the secondary system can be switched in with little or no interruption. More often, a hot spare is a single vital component without which the entire system would fail. The spare component is integrated into the system in such a way that in the event of a problem, the system can be altered to use the spare component. This may be done automatically or manually, but in either case it is normal to have some means of error detection. A hot spare does not necessarily give 100% availability or protect against temporary loss of the system during the switching process; it is designed to significantly reduce the time that the system is unavailable.

Hot standby may have a slightly different connotation of being active but not productive to hot spare, that is it is a state rather than object. For example, in a national power grid, the supply of power needs to be balanced to demand over a short term. It can take many hours to bring a coal-fired power station up to productive temperatures. To allow for load balancing, generator turbines may be kept running with the generators switched off so as peaks of demand occur, the generators can rapidly be switched on to balance the load. Being in the state of being ready to run is known as hot standby. Though it is not a modern phenomenon, steam train operators might hold a spare steam engine at a terminus fired up, as starting an engine cold would take a significant amount of time.

The spare may be similar component or system, or it may be a system of reduced performance, designed to cope for the duration of the time to repair and recover the original component. In high availability systems, it is common to design so that not only is there a spare that can quickly be switched in, but also that the failed component can be repaired or replaced without stopping the system - this is known as hot swapping. It may be considered that the probability of a second failure is low, and therefore the system is designed simply to allow operation to continue until a suitable maintenance period. The appropriate solution is normally determined by balancing the costs of implementing the availability against the likelihood of a problem and the severity of that problem. there are two types of hot standby: 1. hot standby master - slave 2. hot standby in shearing mode

A hot spare disk is a disk or group of disks used to automatically or manually, depending upon the hot spare policy, replace a failing or failed disk in a RAID configuration. The hot spare disk reduces the mean time to recovery (MTTR) for the RAID redundancy group, thus reducing the probability of a second disk failure and the resultant data loss that would occur in any singly redundant RAID (e.g., RAID-1, RAID-5, RAID-10). Typically, a hot spare is available to replace a number of different disks and systems employing a hot spare normally require a redundant group to allow time for the data to be generated onto the spare disk. During this time the system is exposed to data loss due to a subsequent failure, and therefore the automatic switching to a spare disk reduces the time of exposure to that risk compared to manual discovery and implementation.

The concept of hot spares is not limited to hardware, but also software systems can be held in a state of readiness, for example a database server may have a software copy on hot standby, possibly even on the same machine to cope with the various factors that make a database unreliable, such as the impact of disc failure, poorly written queries or database software errors.

At least two units of the same type will be powered up, receiving the same set of inputs, performing identical computations and producing identical outputs in a nearly-synchronous manner. The outputs are typically physical outputs (individual ON/OFF type digital signals, or analog signals), or serial data messages wrapped in suitable protocols depending upon the nature of their intended use. Outputs from only one unit (designated as the master or on-line unit, via application logic) are used to control external devices (such as switches, signals, on-board propulsion/braking control devices, etc.) or simply to provide displays. The other unit is a hot-standby or a hot spare unit, ready to take over if the master unit fails. When the master unit fails, an automatic failover to the hot spare occurs within a very short time and the outputs from the hot spare, now the master unit, are delivered to the controlled devices and displays. The controlled devices and displays may experience a short blip or disturbance during the failover time. However, they can be designed to tolerate/ignore the disturbances so that the overall system operation is not affected.