random technotes…

Main menu

Post navigation

Building a high-available failover cluster with Pacemaker, Corosync & PCS

When running mission-critical services, you don’t want to depend on a single (virtual) machine to provide those services. Even when your systems would never crash or hang, from time to time you will need to do some maintenance and restart some services or even the whole machine. Fortunately, clusters were designed to overcome these problems and give the ability to reach a near 100% uptime for your services.

Introduction

There are a lot of different scenarios and types of clusters but here I will focus on a simple, 2 node, high availability cluster that is serving a website. The focus is on the availability and not on balancing the load over multiple nodes or improving performance. Of course this example can be expanded or customized to whatever your requirement would be.

To reach the service(s) offered by our simple cluster, we will create a virtual IP which represents the cluster nodes, regardless of how many there are. The client only needs to know our virtual IP and doesn’t have to bother for the “real” IP addresses of the nodes or which node is the active one.

In a stable situation, our cluster should look something like this:

There is one owner of the virtual IP, in this case that is node 01. The owner of the virtual IP also provides the service for the cluster at that moment. A client that is trying to reach our website via 192.168.202.100 will be served the webpages from the webserver running on node 01. In the above situation, the second node is not doing anything besides waiting for node 01 to fail and take over. This scenario is called active-passive.

In case something happens to node 01, the system crashes, the node is no longer reachable or the webserver isn’t responding anymore, node 02 will become the owner of the virtual IP and start its webserver to provide the same services as were running on node 01:

For the client, nothing changes since the virtual IP remains the same. The client doesn’t know that the first node is no longer reachable and sees the same website as he is used to (assuming that both the webserver on node 01 and node 02 server the same webpages).

When we would need to do some maintenance on one of the nodes, we could easily manually switch the virtual IP and server-owner, do our maintenance on one node, switch back to the first node and do our maintenance on the second node. Without downtime.

Building the cluster

To build this simple cluster, we need a few basic components:

Service which you want to be always available (webserver, mailserver, file-server,…)

Resource manager that can start and stop resources (like Pacemaker)

Messaging component which is responsible for communication and membership (like Corosync or Heartbeat)

Optionally: file synchronization which will keep filesystems equal at all cluster nodes (with DRDB or GlusterFS)

The example is based on CentOS 7 but should work without modifications on basically all el6 and el7 platforms and with some minor modifications on other Linux distributions as well.

The components we will use will be Apache (webserver)as our service, Pacemaker as resource manager, Corosync as messaging (Heartbeat is considered deprecate since CentOS 7) and PCS to manage our cluster easily.

In the examples given, pay attention to the host where the command is executed since that can be critical in getting things to work.

Preparation

Start with configuring both cluster nodes with a static IP, a nice hostname, make sure that they are in the same subnet and can reach each other by nodename. This seems to be a very logical thing but could easily be forgotten and cause problems later down the road.

1

2

3

4

5

[jensd@node01~]$uname -n

node01

[jensd@node01~]$ipa|grep"inet "

inet127.0.0.1/8scopehostlo

inet192.168.202.101/24brd192.168.202.255scopeglobaleno16777736

1

2

3

4

5

[jensd@node02~]$uname -n

node02

[jensd@node02~]$ipa|grep"inet "

inet127.0.0.1/8scopehostlo

inet192.168.202.102/24brd192.168.202.255scopeglobaleno16777736

1

2

3

4

5

6

[jensd@node01~]$ping -c1node02

PINGnode02(192.168.202.102)56(84)bytesofdata.

64bytesfromnode02(192.168.202.102):icmp_seq=1ttl=64time=1.31ms

---node02pingstatistics---

1packetstransmitted,1received,0%packetloss,time0ms

rttmin/avg/max/mdev=1.311/1.311/1.311/0.000ms

1

2

3

4

5

6

[jensd@node02~]$ping -c1node01

PINGnode01(192.168.202.101)56(84)bytesofdata.

64bytesfromnode01(192.168.202.101):icmp_seq=1ttl=64time=0.640ms

---node01pingstatistics---

1packetstransmitted,1received,0%packetloss,time0ms

rttmin/avg/max/mdev=0.640/0.640/0.640/0.000ms

Firewall

Before we can take any actions for our cluster, we need to allow cluster traffic trough the firewall (if it’s active on any of the nodes). The details of these firewall rules can be found elsewhere. Just assume that this is what you have to open:

When testing the cluster, you could temporarily disable the firewall to be sure that blocked ports aren’t causing unexpected problems.

Installation

After setting up the basics, we need to install the packages for the components that we planned to use:

1

2

3

[jensd@node01~]$sudoyuminstallcorosyncpcspacemaker

...

Complete!

1

2

3

[jensd@node02~]$sudoyuminstallcorosyncpcspacemaker

...

Complete!

To manage the cluster nodes, we will use PCS. This allows us to have a single interface to manage all cluster nodes. By installing the necessary packages, Yum also created a user, hacluster, which can be used together with PCS to do the configuration of the cluster nodes. Before we can use PCS, we need to configure public key authentication or give the user a password on both nodes:

1

2

3

4

5

[jensd@node01~]$sudopasswdhacluster

Changingpasswordforuserhacluster.

Newpassword:

Retypenewpassword:

passwd:allauthenticationtokensupdatedsuccessfully.

1

2

3

4

5

[jensd@node02~]$sudopasswdhacluster

Changingpasswordforuserhacluster.

Newpassword:

Retypenewpassword:

passwd:allauthenticationtokensupdatedsuccessfully.

Next, start the pcsd service on both nodes:

1

[jensd@node01~]$sudosystemctlstartpcsd

1

[jensd@node02~]$sudosystemctlstartpcsd

Since we will configure all nodes from one point, we need to authenticate on all nodes before we are allowed to change the configuration. Use the previously configured hacluster user and password to do this.

1

2

3

4

5

[jensd@node01~]$sudopcsclusterauthnode01node02

Username:hacluster

Password:

node01:Authorized

node02:Authorized

From here, we can control the cluster by using PCS from node01. It’s no longer required to repeat all commands on both nodes (imagine you need to configure a 100-node cluster without automation).

Create the cluster and add nodes

We’ll start by adding both nodes to a cluster named cluster_web:

1

2

3

4

[jensd@node01~]$sudopcsclustersetup--namecluster_webnode01node02

...

node01:Succeeded

node02:Succeeded

The above command creates the cluster node configuration in /etc/corosync.conf. The syntax in that file is quite readable in case you would like to automate/script this.

After creating the cluster and adding nodes to it, we can start it. The cluster won’t do a lot yet since we didn’t configure any resources.

1

2

3

[jensd@node01~]$sudopcsclusterstart--all

node02:StartingCluster...

node01:StartingCluster...

You could also start the pacemaker and corosync services on both nodes (as will happen at boot time) to accomplish this.

The above message tells us that there still is an error regarding STONITH (Shoot The Other Node In The Head), which is a mechanism to ensure that you don’t end up with two nodes that both think they are active and claim to be the service and virtual IP owner, also called a split brain situation. Since we have simple cluster, we’ll just disable the stonith option:

1

[jensd@node01~]$sudopcspropertysetstonith-enabled=false

While configuring the behavior of the cluster, we can also configure the quorum settings. The quorum describes the minimum number of nodes in the cluster that need to be active in order for the cluster to be available. This can be handy in a situation where a lot of nodes provide simultaneous computing power. When the number of available nodes is too low, it’s better to stop the cluster rather than deliver a non-working service. By default, the quorum is considered too low if the total number of nodes is smaller than twice the number of active nodes. For a 2 node cluster that means that both nodes need to be available in order for the cluster to be available. In our case this would completely destroy the purpose of the cluster.

To ignore a low quorum:

1

2

3

4

5

6

7

[jensd@node01~]$sudopcspropertysetno-quorum-policy=ignore

[jensd@node01~]$sudopcsproperty

ClusterProperties:

cluster-infrastructure:corosync

dc-version:1.1.10-32.el7_0-368c726

no-quorum-policy:ignore

stonith-enabled:false

Virtual IP address

The next step is to actually let our cluster do something. We will add a virtual IP to our cluster. This virtual IP is the IP address that which will be contacted to reach the services (the webserver in our case). A virtual IP is a resource. To add the resource:

As you can see in the output of the second command, the resource is marked as started. So the new, virtual, IP address should be reachable.

1

2

3

4

5

6

[jensd@node01~]$ping -c1192.168.202.100

PING192.168.202.100(192.168.202.100)56(84)bytesofdata.

64bytesfrom192.168.202.100:icmp_seq=1ttl=64time=0.066ms

---192.168.202.100pingstatistics---

1packetstransmitted,1received,0%packetloss,time0ms

rttmin/avg/max/mdev=0.066/0.066/0.066/0.000ms

To see who is the current owner of the resource/virtual IP:

1

2

[jensd@node01~]$sudopcsstatus|grepvirtual_ip

virtual_ip(ocf::heartbeat:IPaddr2):Startednode01

Apache webserver configuration

Once our virtual IP is up and running, we will install and configure the service which we want to make high-available on both nodes: Apache. To start, install Apache and configure a simple static webpage on both nodes that is different. This is just temporary to check the function of our cluster. Later the webpages on node 01 and node 02 should be synchronized in order to serve the same website regardless of which node is active.

In order for the cluster to check if Apache is still active and responding on the active node, we need to create a small test mechanism. For that, we will add a status-page that will be regularly queried. The page won’t be available to the outside in order to avoid getting the status of the wrong node.

Create a file /etc/httpd/conf.d/serverstatus.conf with the following contents on both nodes:

1

2

3

4

5

6

7

8

[jensd@node0x~]$cat/etc/httpd/conf.d/serverstatus.conf

Listen127.0.0.1:80

<Location/server-status>

SetHandlerserver-status

Orderdeny,allow

Denyfromall

Allowfrom127.0.0.1

</Location>

Disable the current Listen-statement in the Apache configuration in order to avoid trying to listen multiple times on the same port.

Put a simple webpage in the document-root of the Apache server that contains the node name in order to know which one of the nodes we reach. This is just temporary.

1

2

3

4

[jensd@node01~]$cat/var/www/html/index.html

<html>

<h1>node01</h1>

</html>

1

2

3

4

[jensd@node02~]$cat/var/www/html/index.html

<html>

<h1>node02</h1>

</html>

Let the cluster control Apache

Now we will stop the webserver on both nodes. From now on, the cluster is responsible for starting and stopping it. First we need to enable Apache to listen to the outside world again (remember, we disabled the Listen-statement in the default configuration). Since we want our website to be served on the virtual IP, we will configure Apache to listen on that IP address.

By default, the cluster will try to balance the resources over the cluster. That means that the virtual IP, which is a resource, will be started on a different node than the webserver-resource. Starting the webserver on a node that isn’t the owner of the virtual IP will cause it to fail since we configured Apache to listen on the virtual IP. In order to make sure that the virtual IP and webserver always stay together, we can add a constraint:

To avoid the situation where the webserver would start before the virtual IP is started or owned by a certain node, we need to add another constraint which determines the order of availability of both resources:

Apparently, this is a known bug which is described in Redhat bugzilla bug #1030583.

It seems that the interfaces are reporting that they are available to systemd, and the target network-online is reached, while they still need some time in order to be used.

A possible workaround (not so clean), is to delay the Corosync start for 10 seconds in order to be sure that the network interfaces are available. To do so, edit the systemd-service file for corosync: /usr/lib/systemd/system/corosync.service

1

2

3

4

5

6

7

8

9

10

11

12

13

14

[Unit]

Description=CorosyncClusterEngine

ConditionKernelCommandLine=!nocluster

Requires=network-online.target

After=network-online.target

[Service]

ExecStartPre=/usr/bin/sleep10

ExecStart=/usr/share/corosync/corosyncstart

ExecStop=/usr/share/corosync/corosyncstop

Type=forking

[Install]

WantedBy=multi-user.target

Line 8 was added to get the desired delay when starting Corosync.

After changing the service files (customized files should actually reside in /etc/systemd/system), reload the systemd daemon:

1

[jensd@node01~]$sudosystemctldaemon-reload

1

[jensd@node02~]$sudosystemctldaemon-reload

After rebooting the system, you should see that the cluster started as it should and that the resources are started automatically.

Now you have a 2 node web-cluster that enables you to reach a much higher uptime. The next and last thing to do is to ensure that both webservers serve the same webpages to the client. In order to do so, you can configure DRDB. More about that in this post: Use DRBD in a cluster with Corosync and Pacemaker on CentOS 7

It seems that the configuration and settings do not survive reboot on one of the cluster nodes. I tried this a couple of times but I still got this same problem during reboot.
Did I missed something from your configuration? Can you help figure out the problem?

Hi there,
Excellent material for cluster in general and how to build it using pcs. This is the first time I have created cluster in Linux [Centos EL v7] and without this step by step blog it would have meant hours of frustration and reading.

I have got the 2 node cluster up & running, and I can see the webpage [virtual IP] works smoothly with node failover. Only small issue I am facing is, its showing ‘webserver’ as stopped with couple of errors. I tried to dig-in but didn’t get any clue. Any advise would be greatly appreciated.

Do you see the same output on the second node? (centos7p)
If your website is reachable via the virtual IP, I expect it to be started over there.

You could have a look in the log of Apache (/var/log/httpd/error_log) and look there for errors. My guess is that, while Apache is started on the first node, it is still trying to start it on the second node, which causes problems since Apache can’t listen on the virtual IP and port.

I managed to get 2 nodes working with your guide, with some minor modifications. However, I was having trouble getting DRBD to work, since it doesn’t seem to have a version for CentOS 7. I then tried to use a CentOS 6.5 system, but apparently PCS doesn’t work on CentOS 6.5, so now I’m kinda stuck. How can I get a Linux HA setup with DRBD working on CentOS?

DRDB should work on CentOS 7 in a very similar way as it does on 6.5. A while ago, I started to work on a blog post about DRDB but haven’t found the time to finish it yet. I’ll do my best to get it online this week :)

Thanks for the info shared to the world. It helped lot …
I too managed to get 2 nodes working with your guide. I am also trying to bringup DRBD on the same CentOS 7.
If you have completed similar guide for the DRBD . please share the link.

Sure. Thank you. I was able to get this setup to work just fine locally but I’m having a hard time with my VPS provider. I have a local IP and a public IP. In etc/hosts “nginx1” and “nginx2” map to their respective public IPs.

This looks like a firewall issue (or something like SELinux or Apparmor). Can you telnet to the ports that I opened in the firewall between the hosts? Maybe test it by temporarily disabling security measures.

In my CentOS 7 sandbox, I have Firewalld instead of iptables, which means that I couldn’t use the instructions in this article for setting the firewall. To enable the firewall for the required ports, I have run:

Great tutorial Jens, really appreciate all the effort! As I’m on CentOS 7, I needed the firewall-cmd commands from Jorge to make it all work (thanks for that!) I also see the same weird messages Miranda mentioned, but apart from that the cluster seems to be working fine.

Thanks for very useful post.
But I have a question: how can I create resource and monitoring a daemon instead of agent list in (pcs resource agents ocf:heartbeat) as you did with apache
sudo pcs resource create webserver ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl=”http://localhost/server-status” op monitor interval=1min

You can get a list of possible resources supported with: pcs resource list. If there’s no direct support for the type you want to monitor, you can monitor the systemd service if needed. It depends a little on the type of service you have.

Thanks so much,
I have it run now with resource like this with “myinitd” is startup scrip in /etc/init.d
sudo pcs resource create server lsb:myinitd op monitor interval=”30″ timeout=”60″ \
op start interval=”0″ timeout=”60″ \
op stop interval=”0″ timeout=”60″ \
meta target-role=”Started”
But with this config, pacemaker doesn’t see when service stop. That’s it doesn’t know when I type
/etc/init.d/myinitd stop/start
here pacemaker just know some how at node level. How to config it at service level?

Hi, I am also facing similar issue with my custom “ARcluster” in /etc/init.d. service is not being failed over to another node when /etc/init.d/ARCluster stop/start.
pcs resource show ARCluster
Resource: ARCluster (class=lsb type=arcluster)
Meta Attrs: target-role=Started
Operations: monitor interval=60s (ARCluster-monitor-interval-60s)

Please let me know what changes to be done If you were able to solve your issue

I could success after first try, thanks for tutorial.
I will go on with DRBD.

But I have question;
We got little webservre-webpages, where there are few crontabs which is due webpage content. Querying data from other source to show then in webpage.
So is it possible inert these crontabs into cluster. So if node01 is down then node02 could go on with crons.

Don’t know if it’s possible to control cron with corosync but it seems like a weird idea. In your place, I would let the cronjob check if it’s on the active node (for example something with the response code of curl http://localhost/server-status) and then execute or skip execution.

I wouldn’t disable cron because it’s used for other usefull stuff too (lik logrotate). Personally, I would add the cron script on both nodes and include the check for an active node in the beginning of the script. Both scripts would get executed by cron but only one actually makes some changes because it would pass the check.

Good guide, but, I’m running both servers and all were fine until step where I must start the cluster: “pcs cluster start –all”, besides of long waiting time for, on terminal appear the following error:
node01: Starting Cluster…
Redirecting to /bin/systemctl start corosync.service
Job for corosync.service failed. See ‘systemctl status corosync.service’ and ‘journalctl -xn’ for details.
node02: Starting Cluster…
Redirecting to /bin/systemctl start corosync.service
Job for corosync.service failed. See ‘systemctl status corosync.service’ and ‘journalctl -xn’ for details.
I had googled this error, but can’t find any fix or solution.
I disabled firewall and SELinux, changed the corosync file in system folder and reload the daemons, but the errors persist.
I appreciate any help to continue with the steps.
Thanks in advance!

Hi,
I resolved the problem. The error appeared why I hadn’t configured in nodes /etc/hosts its own ip-hostames, only I had configured the ip-hostaname of each node with respect to the other.
So, I have another question. I wanna to know why disable stonith in a simple cluster (2 nodes)?
Thanks in advance!

Hi, your tutorial is great and it saved me. I use server status for crontab schedules too.
So, my question is there any tutorial for mysql (mariadb), let’s say I will user NFS for datastore and manage mysql resource with pcs…

Hi! Firstly, thank a lot to your post and I would like to ask you some questions!
Number 1. After the node 1 dead, the service will switch to node 2 for using –> It’s ok but after node 1 up again, the system switch back to node 1 for using. So how can I configure to keep node 2 for using?
Number 2. I would like to add one more node to the cluster to make a judge. I mean the third node can judge who is the primary node and who can be using. How can I get it?
Thanks sir so much! :)

Thx for this great tutorial, it really well written.
BTW people, what are your experiences with corosync?
I must say it’s really disappointing.
First thing I’ve noticed is that for some reason you can’t do a hb_takeover you have to play around with un/standby or cluster stop commands.
And if you try to do that nothing happens, why? Because somewhere there is a default timeout of around 5 minutes, still have to figure it out where it is…..
And after it finally dies, did the other node takeover, noooooo it just sits there playing stupid and saying it’s online. Does the log file say why it didn’t take over? No ;)

So out of the box, you would expect a service that is rock solid, doesn’t freeze, and does a takeover in a second or so at best, but you get basically get another version of heartbeat with all its flaws….

I would want the pacemaker to restart my apache server when my server goes down. Over here my node is active and has not undergone a failure, if my apache server goes down i want the pacemaker to restart it. How can i do it?

This might seem like a silly question, but i am new to Linux administration. I am trying to set up a high availability cluster on redhat. I have tried our tutorial and I am currently stuck at a particular point.
I am trying to add a few custom ocf resource agents, these were not coded by me, there were provided as part of a suite by IBM. Now according to this link http://www.linux-ha.org/doc/dev-guides/_installing_and_packaging_resource_agents.html I have to place the ocf resource files in the following location “/usr/lib/ocf/resource.d/” I placed it under /usr/lib/ocf/resource.d/ibm .

When i run the command pcs resource providers it lists the ibm folder along with heartbeat and when i run pcs resource agents ocf:ibm it lists all the resource agents under that folder.

However, when i try to add a resource to the cluster using pcs resource create using the agents i installed under ibm, it gives me an error Unable to create resource , it is not installed on this system (use --force to override)

Well you will have to “locate” both server in same network, mean layer 2. So for example you will have servers with IP addresses of 10.64.38.68 and 10.64.38.69. If 10.64.38.70 is free, then you can use it for Virtual-IP.

The “real” IP’s have to be in the same subnet. As far as I know (it’s been a while) the active node responds to the ARP-request for the virtual IP with it’s MAC-address. In case the nodes would be in a different subnet/VLAN, they wouldn’t receive the ARP-request for the virtual IP…

Hi,
I’ve implemented a cluster (active/pasive) with apache in centos 7 using pacemaker & corosync. Exactly described in these guide (excellent material). Everything related to the cluster operation is working fine. But, I have a requirement that I don’t know how to implement… I’ve made a post in centos forum, but anybody answered.. perhaps here somebody could resolve these:
The problem is that these apache is going to be also our load balancer. We need hi availability and the flexibility of the reload operation of the apache. On daily basis part of our tasks are to add sites, add nodes to the balanced applications, etc. So we need a cluster operation that permit us to invoke the apache reload (graceful) anytime we need without disrupt the service (user established connections, etc.)

We are using sevice ocf::heartbeat:apache, and these operations aren’t available ..

I’ve implemented a cluster (active/pasive) with apache in centos 7 using pacemaker & corosync. Exactly described in these guide (excellent material). Everything related to the cluster operation is working fine, until the need to reboot the primary node.

once this happens the VIP fails over to the second node, however once the primary node comes back online , the VIP doesnt fail back and the clusters appears to have a communication issue.

issue: pcs cluster stop –force on both nodes, then pcs cluster start –all ( from the first node ) clears the issue.

The is with /usr/lib/systemd/system/corosync.service editted
ExecStartPre=/usr/bin/sleep 10
Am I missing something as regards recovery from failure? do any commands need to be issued in order for node1 to come( it use to automatic)

pcsd seems to just go wrong on the second node and wont work with the cluster.

failover and failback
el6 pacemaker with crm/pcsd/corosync
el6 heartbeat with crm
both work fine

Hi,
really nice article. I follow these procedures and I’m facing this issue:
In the beginning, having one export, everything works fine. I cold reset the nodes one after another and the services are migrated each time successfuly to the other node. But when I add another export directory (with different fsid than the first one), after the first reboot of the active node, the NFS server does not start on one node or the other. The error that I get is that “rpcbind is not running”. While tailing /var/log/messages I see a repeating message of:
nfsserver: INFO: Start: rpcbind i:1
nfsserver: INFO: Start: rpcbind i:2
nfsserver: INFO: Start: rpcbind i:3
nfsserver: INFO: Start: rpcbind i:4

and so one.

After this, the nfs service never starts again on neither node.
After a fresh restart of both nodes, when I try to add an nfs server resource again. the error that I get is:
“Failed to start NFS server: /proc/fs/nfsd/threads”. In the /var/log/messages folder I get: ERROR: nfs-mountd is not running.

Very useful tutorial. I have one request to you that could you please create and share the tutorial for span the cluster. Like i am having one machine at remote location and i wanted to add that machine to my existing cluster.

Hi jensd. I keep getting this error. I have enabled the apache url status. Reboot also didn’t solve the problem. Whenever I start the cluster, it always redirect to server2. I also cannot access the virtual ip from the web. Disabled the selinux policy also didn’t solve the problem. Please help :(

It worked so smoothly, thanks a lot men. Going to try it with zimbra to see how it works. I noticed httpd isnt running, the socket is used for other app it says, i suppose this is for virtual ip or maybe cluster right?
Cheers!
PS: also, the first time it didnt work, i restarted the node1 and worked without problems.

Thank you, Jens. This is the best howto article I’ve read on this topic.
I have a question, though.
I’m supposed to set up a 2 node cluster, the client connects to the cluster using SSH and SFTP to execute command line commands like “cp”, “java – jar”, etc. There is no need for a web application to run on the cluster nodes.
Is it possible to set up the cluster by specifying SSH/SFTP service?

Hi all, very nice tutorial to set up clustering. Actually i have an application consits shell script which runs multiple exe’s(project modules) to execute one task(recording audio and videos from hardware through ip).
My question is how i can set up this shell script to run on both nodes when one node fails how to give control to other node to route packets from hardware to 2nd node.

I noticed that httpd.service not start in both nodes, but the webserver still woks, 80 port still in service in one of the nodes. Would you please explain the reason ? I feel so confused about this aspect.

This a great Article, I am using GFS2 with clvmd my GFS file system fails to comeup on reboot, when I look into log it looks like LVM starting before clvm and may be why failing. I do have order constrain that starts clvm -> LVM-> Filesystem
I have exclusive = false on LVM too