Stretch NFS cluster

I was asked to create an NFS stretch cluster when one node is on the first site and serves its local clients,
and another node is on the second site, and also serves its local clients.
No real failover should occure, but the content must be shared.

The proposed solution will include DRBD (for emulating a shared disk)
and GFS2 as a clustered file system.
The third node in the third location will be used as a quorum node,
and disconnected node should do suicide by itself.
A real fence can not be used, because the disconnection of the site means that the fencing device also becomes unavailable.
This time I chose to use CentOS 7 as the base OS.

Preparing POC

As usual, the POC will run in the KVM environment.
So, you need to create three routed networks that mimic these three sites:

Creating cluster

Due to the fact that these virtual machines can not connect to the Internet in the current POC configuration,
installing programs can be a difficult task.
You can install all necessary software in a template prior cloning or use a proxy on the host or move it to another network.

Configuring fencing and quorum

After installing the cluster, it's time to configure it.
First we need to understand what we want from him.
The usual fencing strategy will not work for the stretch cluster.
A typical problem will be disconnecting of the entire site,
then it will not be possible to fence the detached node,
because the fencing device will also be unavailable.
Therefore, the third node in the third location will act as a quorum node but will not have any resources.
Then we will configure the cluster for a quorum of 2 votes.
A failed cluster partition (one node, not in quorum) will commit suicide (reboot itself),
which, of course, will stop the operation of this site, but will preserve the integrity of the data.

Searching the Internet, I found that the plug-in for suicide still available only for SuSE.
Other distributions have removed it due to incorrect use by end users.
Having smoked some ready /usr/sbin/fence_* python scripts,
I wrote my own script fence_suicide.
This script lies to the cluster that it has successfully killed a neighbor
and starts a "reboot -f" if it's called for its own node.
Install it on all nodes.

As you can see, the resource is connected and secondary on both nodes and contains inconsistent content.
Let's tell DRBD that the initial resynchronization is not required, because our disk is initially empty: