Your SAP on Azure – Part 3 – High Availability for SAP NetWeaver on Windows

Even if your system is deployed in the Cloud environment it doesn’t mean it will work 24 hours per day, 7 days a week. System restart can happen due to various reasons. Microsoft distinguishes two main scenarios that can impact your VM availability:

Unexpected downtimes – when the physical infrastructure underlying VM has faulted in any way

Planned maintenance events – periodic updates to Azure platform

Today I would like to show you how can you prepare for above scenarios and ensure your SAP system can be accessed all the time!

Did you know that you can like this post? It’s the easiest way to show your support! Just scroll up a bit and click on the big Like button.Thanks!

SAP NETWEAVER HIGH AVAILABILITY

There is already a very nice blog written by Juan Reyes, which explains what is High Availability and how to use it. But before we start with the configuration I would like to remind you some principles. In every SAP installation, we can distinguish three separate components:

Database

Central Services Instance (ASCS)

Application Instance

Each component is required to run the SAP system. Database stores all the data. Central services instance is responsible for load distribution of the requests and manages the lock table. The application server is actually processing users requests.

When we deploy a simple landscape into Azure, above components are contained in single VM or they are split into two or three servers. It reduces the work spent for SAP system administration, but is it really secure?

As there is no redundancy, whenever any of the components fails, the entire system is unavailable. In this post, I’m focusing to show how to protect the ASCS instance which is the heart of every SAP system. I won’t discuss protecting the database, as each vendor has its own features and solutions.

It’s relatively easy to protect the Primary Application Server. The only required action is to install additional application instance and point to the same message server (ASCS). Distributed installs are popular and they can also improve the overall performance of your system as user activity can be balanced between multiple servers.

The tricky bit is the Central Services instance. It contains information about all the servers in your landscape and keeps the lock table entries. Without it, every SAP system will shut down within few minutes and users won’t be able to connect.

To enable the High Availability for Central Services instance it is required to have a shared storage, that can be accessed by two VMs at the same time. Unfortunately, things that are easy on-premise sometimes can bring some troubles in the Cloud. At the moment Azure doesn’t offer any solution that could work as shared storage for the servers. Therefore we will use an additional product called SIOS DataKeeper that will synchronize the drives on two virtual machines in real-time. When a node becomes inactive then it’s not a problem – you have the exact copy on the second drive. SIOS DataKeeper also takes care of the failover, so your files are constantly available. The presented solution is supported by Microsoft for SAP installations.

HIGH AVAILABILITY IN AZURE

With a proper configuration and VM redundancy, Microsoft guarantees the 99,95% Azure SLA. That means, that during a year your solution may encounter only 5 hours of downtime.

We start by creating a new Availability Set in Azure Portal. VMs created in a single availability set will be deployed and distributed in isolated hardware clusters. In case of hardware or Azure software failure, only a subset of your servers will be impacted.

Each Availability Set consists of multiple update and fault domains.

An update domain is a group of VMs and underlying physical hardware that can be rebooted at the same time. VMs in the same fault domain share common storage as well as a common power source and network switch.

Our ASCS instance will be placed on two servers, therefore we configure only 2 update and fault domains.

Now, during the creation of the VMs, we need to select the Availability Set. The same configuration applies to the SAP application instance. If you plan to deploy multiple application servers, ensure they are placed in the same availability set.

Important! It is not possible to assign or change the Availability Set after VM is created! You will have to re-deploy your VM, so it is really good practice to always create and select availability set.

When you select your VM in Azure Portal and display the availability set assignment you can verify that the update and fault domain are different for each VM.

VM Name

Availability set

Fault domain

Update domain

FC1

FCAvailabilitySet

0

0

FC2

FCAvailabilitySet

1

1

TIP! When designing Availability Sets for your landscape it is recommended to create one for each tier of SAP system. It will ensure that at least one VM per tier is available.

Load Balancer is another component that has to be deployed in Azure to ensure the traffic pointed to Message Server is reaching the VM that is available.

We want to route the internal traffic in our virtual network (not available from the internet). The IP assigned to Load Balancer will be also used as the IP of the Message Server. Next step is to add both nodes of the future cluster to the backend pool and select the ports that should be redirected.

If you don’t want to manually create each of the Azure components you can also use the ARM Template. Most of them you can customize and decide if you want to use the High Availability.

Load Balancer is checking the server availability with the use of health probe. At the moment we are focusing on the Azure configuration and we will set up the probe on Windows later.

In total I have created eight components in Azure:

Name

Type

Purpose

Comment

FC1

VM

First node for ASCS

FC2

VM

Second node for ASCS

FC-AD

VM

Domain controller

required for HA setup

FC-DB

VM

Database instance

HA for the database is not covered in this post

FC-SAP

VM

Application server

You can create additional application server to have a highly available environment

FC-LB

Load Balancer

Load Balancer for ASCS instances

Required for Windows Failover Cluster

ASCSCCW

Storage Account

Cluster Witness

Required by Windows Failover Cluster

FCAvailabilitySet

Availability Set

Availability Set

WINDOWS FAILOVER CLUSTER CONFIGURATION

The configuration starts with installing the Windows feature Failover Clustering. You can do it through the Add Roles and Features GUI or with the use of PowerShell.

When the feature is installed I decided to install the SIOS DataKeeper software for disk synchronization on both servers. The process is very easy. My two servers are part of Active Directory, therefore I choose the option to use the domain accounts:

When the installation is complete your server will be restarted and we can prepare the Windows Failover Cluster.

The cluster is now prepared. As we are doing a 2-node setup I will also add the quorum witness.

We want to leverage as much of Azure features as we can, therefore the quorum witness will be placed in the Azure Cloud Platform!

The cloud witness is a storage account created in Azure. On the next screen, we provide its name and shared access key (which can be found on Azure Portal).

The Failover Cluster configuration is finished. Now, we need to create the DNS record for our ASCS install (it should be the same IP address as we have assigned to the Load Balancer)

SHARED STORAGE WITH SIOS DATAKEEPER

SIOS DataKeeper is a solution that can help us with the shared storage issue. At the moment Microsoft Azure doesn’t offer any storage that can be accessed by two VMs at the same time. DataKeeper software is continuously synchronizing two selected drives and performs the failover in case the first node stops working.

Open the DataKeeper GUI to start configuration.

Choose Connect to Server to establish communication between both nodes.

We are ready to create new replication job.

Select the source and target VMs

During the third step, we need to decide on the compression level and replication mode. The synchronous replication is required when protecting ASCS instances.

It takes around 1 minute to finish the configuration. Afterwards, the mirroring begins. The disk is now also visible in Failover Cluster Manager.

When the cluster is running, you can’t access the drive on the second node. That’s a correct behavior.

CENTRAL SERVICES INSTALLATION

Start the Software Provisioning Manager and select the installation of ASCS instance on the first node.

Select the drive on which the SAP Host Agent will be installed. This shouldn’t be the replicated drive.

On the next screen, we enter the SAP System ID and the Virtual Hostname (the DNS name that you created few steps back). The cluster network and shared drive are already discovered.

The installation has to be a domain installation.

Decide on the password for the domain accounts:

One of the last step before installing Central Services Instance is to choose the instance number

When performing installation in the cluster environment it is recommended to choose the integrated Gateway

Finally choose the Enqueue Replication Server instance number, which is responsible for keeping the lock entries. In a standard installation, the lock entries are kept only in ASCS, but as we have two central services instances we need to use ERS to replicate the locks.

And the installation starts. When it’s finished, we can enable the health proble for ASCS cluster, which we have previously declared during Load Balancer deployment. The easiest way is to execute the PowerShell script created by Microsoft:

During the script execution you will be asked to restart the cluster. After a moment we get confirmation it all went fine.

It’s time to switch to the second node of the cluster and start the Software Provisioning Manager to install the ASCS on the second node.

One of the first step is to decide which Cluster Group we want to extend (just in case we have more than one).

Then the process looks very similar to the first node installation. Here we need to provide settings for ERS instance on the second node

After we fill all the parameters the installation starts. It doesn’t take too much time to process and finally, the ASCS install on the second node is finished.

DATABASE INSTANCE INSTALLATION

When the install is finished, we need to import the database schema. When performing a non-HA installation all steps are executed during a single SWPM run. Here, each component (ASCS, the database and application server) are installed separately.

We point to the shared directory with the profile parameters file.

The install is pretty easy so I won’t show all the screens. Just remember to point to the correct installation media.

And after all the data is imported into the database, the installation is finished.

APPLICATION SERVER INSTALLATION

Now we would like to test our configuration. In order to do that we will require the application server. To simplify this post I will show how to install only one application instance, but in your highly available environment, you should definitely consider installing more than one…

As always I’m choosing the custom mode:

We are asked to point to the profile directory:

If that’s your first application server in the landscape, leave the default settings in SAP System DDIC Users:

And the installation starts:

The installation of Highly Available SAP System is completed so we can start testing. There are two things that we need to pay attention to:

The system is constantly available – our session should not be interrupted

The lock table is not cleared

For the purpose of testing I’m editing the DDIC user – this way I created a lock on the system:

Notice, that the ERS is running on the second node (the inactive one). That’s the correct behavior.

When I shut down the first ASCS node I can’t see any difference from the SAP GUI. The system is working correctly I can change the screens. When I go to SAP Management Console the only difference is that ERS on the FC1 is unavailable (which proves the server is stopped).

Lock entry is still available, so the failover worked as expected!

This is the second part of my blog series about SAP and Azure. You can access previous parts by using following links:

“SAP at1” or “SAP at1 IP” mean nothing for me, I have nothing named like that in my VM (I didn’t install any SAP product yet), so logically, when I run the get command, the system doesn’t find “SAP at1 IP”

When I run Get-ClusterResource, it returns me only the cloud witness, the ip and the name of the cluster, that’s all, there is no “SAP xxx”.