High Availability - Frequently Asked Questions

This FAQ answers questions regarding High Availability on SAP systems. General nomenclature is covered as well as SAP specific questions. Links to detailed infromation are available at the High Availability Page on SDN.

What is a "High Availability Cluster"?

The aim of a High Availability Cluster (HAC) is to provide the availability of specific services within a system landscape. This is reached by reducing downtimes using redundant cluster nodes. A common HAC setup consists of two cluster nodes, which is the minimum for redundancy. If one server node crashes, the second node takes over the clustered services and secures the availability this way. This is also called a "two-node cluster" or a "failover cluster." According to the demands of availability, the number of server nodes can be increased to minimize the risks of failure of a node.

What is "Switchover"?

"Switchover" is referred to as a planned switchover of a primary server to a standby server, which means without a failure of the primary server. A switchover is always initiated by the system administrator.

What is "Failover"?

“Failover” is referred to the process of an unplanned switchover from a primary server to a standby server system in case of a system fail of the primary server node. Other than a switchover a failover is performed automatically by the cluster software. Some cluster software, such as Microsoft Cluster Service, does also provide an option for a manual failover for testing purposes.

What is "Fallback"?

“Fallback” is referred to the process of switching back from a secondary server node to the primary server node after a failover occurred and the primary server node is available again. A fallback can be done automatically by the cluster software or intelligent, which means manual by the system administrator.

What is an "Active-Passive Cluster"?

An active-passive cluster consists of two independent server nodes at a minimum. The primary server node performs all operations. A secondary node acts as a so called "standby system."In case of a system failure of the primary node, the cluster software fails over automatically to the standby server node, which starts the processes and resumes the work of the primary server node. Cluster groups are only active on one server node at the same time.Please note that an active-passive cluster configuration do not implicate that the standby server node does not contain any workload. The active-passive configuration only referrers to the cluster group, which means that in an active-passive configuration the resource, can only be active on one server node at the same time. However, if a cluster environment contains more than one cluster group, theses groups can be distributed within the cluster environments.The SAP Central Services Instance is implemented as an active-passive cluster.

Active-Passive cluster in normal state

Active-Passive cluster in failover state

What is a "Standby System"?

A standby system is a redundant cluster node that takes over the processes if the primary cluster server fails. This is referred to as a "failover" and is performed automatically by cluster software. There can be several standby cluster nodes. The amount is only limited by the capabilities of the cluster software. For example, the Microsoft Cluster Service provides up to 8 server nodes, which means 7 standby nodes as a maximum."Standby systems" can be in a "hot" state or a "cold" state. A "hot standby" means that the processes run on the standby node also, which means that in case of a failure the cluster resource is running already and does not need to be started on the standby system. A "cold standby" means that in case of a failover the clustered resource needs to be started on the standby system which means that a (short) downtime during the failover occurs. The SAP Central Services Instance is usually implemented as a "cold standby" system due to the fact that SCS is a light component and does not need a long time for startup.

What is an "Active-Active Cluster"?

An active-active cluster consists of two independent server nodes at a minimum. The workload within a cluster resource is shared between the server nodes. If a cluster node crashes the processes are resumed by the remaining cluster nodes. An active-active cluster configuration means that a cluster resource is active on all cluster nodes. The aim of an active-active cluster is not only to provide high availability system but to distribute the workload between the cluster nodes. Applications with a very high workload like databases benefit from an active-active setup. Due to the SAP Central Services Instance is a light component an active-active setup does not make any sense. Therefore the SCS is implemented as an active-passive cluster resource.

Shared Nothing

Shared All

What does "virtualization" Mean?

The term “virtualization” in the context of HA refers to a kind of abstraction performed by the cluster software. The software creates a virtual host that owns a virtual hostname, virtual disk, and so on. “Virtual” in that manner means that such resources cannot only be owned by one physical machine but by all of them. Which node currently owns or runs a resource is managed by the cluster software. Related “resources” are usually grouped to logical containers (for example Groups on MSCS or packages on HPSG) that can perform failovers independently.

What do "shared nothing" and "shared all" mean?

The term "shared nothing" and "shared all" specifies a type of architecture within an active-active cluster. "Shared nothing" means that every cluster node contains its own data partition, which implicates that these kinds of setup are not highly available due to the fact that in case of a failure the data of the failed node is no longer available. In a "shared all" environment the different cluster nodes that run the same service shares a data partition and accesses the data concurrently.These options have to be supported by the cluster software. For example, MSCS does not support the "shared all" option.

What is a "SPOF"?

A Single Point of Failure (SPOF) is any component within a system that, if it fails, causes a loss of a runtime critical service. A SPOF can be hardware or a software component. However, this FAQ only covers the SAP identified potential SPOF software components that are Message Server, Enqueue Server, the central file system and Database. In a High Availability manner it is necessary to eliminate these SPOF.Be aware that it is possible to introduce additional SPOFs through configuration and programming, by adding critical, non redundant components yourself. These must be identified through an analysis and either be eliminated or covered by failover services.

SAP High Availability Setup

How can I set up a High Available system?

SAP supports the installation of High Available systems with several aspects. However, High Availability needs additional software, that is not delivered by SAP. In addition, it is important to analyze your system to be sure that no single points of failure are overseen, as it is possible to configure your system in that way or to write custom software that behaves that way. We recommend to engage experienced consultants on this analysis.In general SAP systems are set up High Available through their technology components, which are the application servers. However, it is also responsibility of a running program to not introduce additional singel points of failure. At SAP this is ensured through extensive quality management, for custom development this should be carefully considered.

What is ASCS/SCS?

With SAP NetWeaver 04 Java, the Message Server and the Enqueue Server are separated from the Central Instance. These two services are grouped within the SAP Central Services Instance (SCS) as services. From NW04s the ABAP Central Services can be also separated from the Central Instance. Each stack, ABAP and Java, has its own Message Service and Enqueue Service. For ABAP systems the Central Services are referred to as ASCS, for Java systems the Central Services are referred to as SCS. The ASCS and the SCS are leveled as SPOF and require a High Availability Setup therefore. If the ASCS is integrated within the ABAP Central Instance (standard in NetWeaver 04) the Central Instance of the ABAP system needs a HA setup also.

What is the difference between HA for ABAP and HA for JAVA?

Within SAP NetWeaver 6.40 ABAP the Message Server and the Enqueue Server are integrated within the ABAP Central Instance (CI).In SAP NetWeaver 6.40 Java the Message Server and the Enqueue Server are implemented as services within the SAP System Central Services Instance (SCS) and separated from the Central Instance (CI) this way.With SAP NetWeaver 04s ABAP the Message Server and the Enqueue Server can be separated from the Central Instance (CI) to the ABAP SAP Central Services Instance (ASCS) in the ABAP stack also (which is recommended for HA setups due to the ASCS is a light component that can be switched over easily).

Is SAP HA also supported for heterogeneous system landscapes?

SAP does not support heterogeneous HA cluster environments at the moment officially. However, preparations for an official support of this type of setup are currently evaluated.

Does SAP support Microsoft Geo clusters?

first of all, Geo-Clustering (Microsofts name for it is geographically dispersed Clusters) is no special product from Microsoft, it's just a special setup of the normal MSCS (or to be more correct, Microsoft Server Cluster 2003) configuration of Windows Server 2003.

Geospan Clusters are not supported by SAP - in terms of configuration and installation.

The standard configuration of a two node MSCS cluster consists of two cluster nodes and a shared storage. All technical components are located in the same data center. In a geographical dispersed cluster the cluster nodes are distributed accross at least two data centers and you need a more complex storage architecture, because a shared storage can only be located in one data center and would become a single point of failure. In order to eliminate the potential storage SPOF you have to configure one storage box in each data center and you need storage replication between those boxes.Replication can be synchronous or assynchronious, depending on the functionality of the storage subsystem, accepted amount of data loss during a failover, the physical layout of the storage area network (distance between the storage boxes, signal latency, capacity and speed of the network connection) and last but not least the budget of the customer and the functionality supported by the database vendor.

Often the database components in geospan configurations are no longer part of the MSCS and the database is replicated by pure database technics (shadow database, log shipping, mirrored database).

There are a lot of different geospan configurations possible, which are more or less planned individually for the customer by storage/hadware vendors in customer specific projects. The numerous variants in the geospan configurations and the complexity of the technical requirements are the primary reasons why those configurations are not directly supported by SAP.

The hardware vendors are responsible for the configuration, setup, installation of the SAP system and that the basic HA technology is working in this environment.

SAP is still supporting the basic functionality of the ABAP / J2EE / database servers and other SAP components in these configurations.Standard SAP installation procedures are normaly used during the SAP System installation of those configurations. But depending on the choosen configuration some steps are different - here again the hardware vendor has the responsibility to deliver the information and support or perform the installation.

There are already a lot of geospan configurations at SAP customers.

Is there a session failover mechanism for SAP NetWeaver AS Java?

NetWeaver AS Java supports a session failover mechanism using DB, local persistence, or shared memory(7.1) which can be implemented in applications. Please take a look into the documentation for further information on how to do that. See the documentation forFailover System in Version 7.0 or Configuring Shared Memory for Version 7.1

What is an "Enqueue Replication Server"?

The Enqueue Server contains the central locking table for the SAP cluster. Besides database locks it also consists of infrastructure locks of system wide objects. It is therefore necessary to secure the locking table in case of a Standalone Enqueue Server failure. The SAP Enqueue Replication Server provides a replication mechanism for the Enqueue Server by holding a copy of the locking table within its shared memory segment. After a failure of the Enqueue Server the locking table can be restored this way. Since SAP NW04 SP15/ NW04s SR1 an automated installation of the Enqueue replication server is available for Windows environments. UNIX/ Linux installations are handled by SAP hardware partners.Note: you can only protect Stand-Alone Enqueue Servers with an Enqueue Replication Server. The standard Enqueue Server in an ABAP CI (Enqueue work process) cannot be protected by an Enqueue Replication Server.

Is an "Enqueue Replication Server" necessary for setting up an SAP HA system landscape?

From Version NW04 SP15 the Replication Server is required for all SAP JAVA HA scenarios. The Replication Server is required for all SAP ABAP HA scenarios as well.

Is SAP NetWeaver AS Java available also if a central service or the database fails?

The SAP NetWeaver Application Server Java uses the database extensively. A loss of the database is very critical for the functionality of AS Java. However, some rudimentary functions and cached information are still available. After the database is available again the Engine is in a full operational state again. The picture below shows the approximated impact after a loss of the database and the central services. Please note that the diagram is only a rule of thumb and not an empiric study.

Concluding there are necessary operations to prevent such loss, that is the purpose of High Availability environments.

What is the difference between Enqueue Server and Standalone Enqueue Server?

Since NW04 Java the Enqueue Server and the Message Server for a J2EE Engine are standalone services hosted by the SAP Central Services instance (SCS). The difference between a Standalone Enqueue Server and Enqueue Service is therefore only formal: The term “service” refers to the Enqueue as part of the SCS; the term “server” refers to the Enqueue as process, either enserver (.exe) or the enqueue work process within the ABAP stack. From NW04s the Enqueue Server and the Message Server are also implemented as Services within the ABAP Sap Central Services Instance (ASCS).

Are my transactions lost after a database failover?

The impact of a database loss depends on the implementation of the database. Some databases supports a session failover mechanism, others do not. Please consult the database specific documentation for further information.

Is there a certification for SAP HA Solutions?

In April 2012, SAP introduced the SAP Application Server HA-Interface Certification. Please find further details about the certfication here.

If you are interested in partners offering HA-solutions, you can find the list of current on SDN here.

Who performs the High Availability setups for SAP Solutions?

The setup can be performed by SAP certified consultants. In Non-MSCS environments on windows and in UNIX environments information about High Availability has to be supplied by the HA vendor or the setup has to be performed in collaboration with the partner. Typically, HA scenarios are setup and delivered by 3rd party vendors, who deliver the hardware, operating system and database altogether and also commit the requested SLA to the customer.

Does SAP NetWeaver AS ABAP, AS Java and Composition Environment support High Availability?

Yes. They all do under the mentioned conditions in this FAQ and the manuals.

SAP High Availability and the Windows environment

How does SAP support High Availability in Windows environments?

SAP provides an out of the box setup for windows cluster environments using the Microsoft Cluster Service (MSCS). For all other windows cluster solutions please get in touch with the specific cluster software vendor.

What is a cluster resource group?

Cluster resources are software and hardware components that are managed by the cluster software. Several resources can be grouped into resource groups. These groups are collections of resources which can be managed by the cluster service as a single unit.

Why can't I access the shared disks when they are part of another cluster node?

The shared disk always belongs to the cluster node that owns the cluster group at the moment. However, you should never access the SAP installation by choosing the disk drive but by using the shared name, that means "/sapmnt." This prevents the loss of the disk during sensitive operations like installations, etc.

How many failover nodes are currently supported by MSCS/SAP?

The number of SAP cluster nodes is only limited by the maximum of cluster nodes supported by MSCS. For Windows Server 2003 the maximum is 8 cluster nodes which mean 7 failover nodes. When using the SAP Replication Server with a cluster configuration the number of nodes is currently limited by two nodes.

How can I initiate a failover on MSCS manually?

On MSCS, failovers can be initiated for testing purposes manually. You can open a command prompt a type in “cluster res /FAIL”, or open the Cluster Administrator then right click on a resource and choose “Initiate Failover”.

What is the "Threshold"?

Within MSCS the term Threshold is referred to the number of failures before a resource fails over to the second cluster node. Please note that if you are using the Replication Server it is very important to set the “Threshold” to “0”. Otherwise the replication will not work

Are there other cluster environments available for the Microsoft Windows platform?

SAP supports an out of the box installation for the Microsoft Cluster Service. However, there are also additional cluster software vendors for windows.As in UNIX environments these setups are handled by the partner.

Where can I find additional information regarding the Microsoft Cluster Service?