Note: this was done more as an experiment than for something I intended to use in production – so consider it to be more a compilation of notes than a full out procedure.

DRBD – Distributed Replicated Block Device – is a kernel level storage system that replicates data across a network. It uses TCP – and typically runs on port(s) starting at 7788. A typical setup will pair DRBD with Heartbeat/Corosync, so that in the event of the failure of a node, the other node can be promoted to primary (or will use a dual-primary setup), and a network filesystem so that both nodes can access the data simultaneously.

The setup described below will only allow one node to access the data at any given time and requires a manual failover to promote the secondary node to primary.

For the following, I am using 2 up-to-date instances running Amazon’s Linux AMI 2011.09 (ami-31814f58) – which is derived from CentOS/RHEL. Both are in the same security group, and these are the only two instances in that security group. Also the hostnames of both instances are unchanged from their default – this is only relevant if you try to use the script included below – if you manually setup the configuration, the hostnames can be whatever you wish.

I have attached one EBS volume to each instance (in addition to the root volume), at /dev/sdf (which is actually /dev/xvdf on Linux).

Install DRBD

Note: all steps in the section are to be performed on both nodes

This AMI already includes the DRBD kernel module in its default kernel. You can verify this with the following:

modprobe -l | grep drbd
kernel/drivers/block/drbd/drbd.ko

Likewise, to find the version of the kernel module, you can use:

modinfo drbd | grep version
version: 8.3.8

It is typically preferable to have the version of the kernel module match the version of the userland binaries. DRBD is no longer included in the CentOS 6 repository – and is not in either the amzn or EPEL repositories. The remaining options, are to therefore use another repository or to build from source – I’d favour the former.

ElRepo – which contains primary hardware related packages – maintains up to date binaries for CentOS and its derivatives – we can either install a specific RPM or simply use the latest copy from the repository.

Setup meta-data storage

Note: all steps in the section are to be performed on both nodes

DRBD can store meta-data internally or externally. Internal storage tends to be easier to recover, while external storage tends to offer better latency. Moreover, for EBS volumes using an XFS filesystems with existing data, external meta-data is required (since there is typically no place to store the meta-data on the disk – XFS can’t shrink, and EBS can’t be enlarged directly).

According to the DRBD User Guide, meta-data size, in sectors, can be calculated with:

echo $(((`blockdev --getsz /dev/xvdf`/32768)+72))

However, for external meta data disks, it appears that you need 128MiB per index (disk). Creating a smaller disk will result in the error “Meta device too small”.

To create our meta-data storage (/var/drbd-meta – change as desired) – initially zeroed out – we will use dd, with /dev/zero as an input source and then mount the file on a loopback device.

Configure DRBD

The default DRBD install creates /etc/drbd.conf – which includes /etc/drbd.d/global_common.conf and /etc/drbd.d/*.res. You will want to make some changes to global_common.conf – for performance and error handling, but for now I am just using the default.

You will need to know the hostname and IP address of both instances in your cluster to setup a resource file. It is important to note that DRBD uses IP address of the local machine to determine which interface to bind to – therefore, you must use the private IP address for the local machine.

You can of course, use an elastic IP as the public IP address. The default port used by DRBD is 7788, and I have used the same, below – you need to open this port (TCP) in your security group.

The above ‘resource’ defines the basic information about the disk and the instances. Note: you should change the ‘disk’ to match the device name you attached your EBS volume as, and ‘meta-disk’ should correspond to the device setup above (or use internal).

If you manually replace the template placeholders, above, you must use the private IP address for the LOCAL_IP, however, you can use either the public or private IP for the REMOTE_IP. The LOCAL_HOSTNAME and REMOTE_HOSTNAME values should match the output of the hostname command on each system. Keep in mind that if you are using a public IP address, you may incur data transfer charges (also keep in mind that an elastic IP maps to the private IP address at times, which will save on data transfer charges). Also the file extension should be .res (not .tmpl) if you make the replacement manually.

A typical setup would have identical resource files on both the local and remote machines. If we wish to use the public IP addresses, this is not possible (since the public IP is not associated with an interface in EC2). Therefore, I used the following script to setup the correct values in the above file (note, you need to setup your private key and certificate in order to use the API tools):

Of course, there are a few shortcomings to the above – it will only handle two instances (the local and one remote) in the group and it expects the hostname to be unchanged (i.e. the value derived from ec2-describe-instances). The above script uses the security group to determe the servers in the. As such, it requires both instances to be in the same security group and will only work if that security group has exactly two instances in it. (It would be trivial to modify it to use something other than security group – for instance a specific tag, but handling more than two instances matching the criteria would take a bit more effort).

At this point you should have an /etc/drbd.d/drbd_res0.res file on both nodes, with the appropriate information filled in (either manually or using a script) – it is worth mentioning that the filename doesn’t actually matter (as long as it ends in .res – which is what /etc/drbd.conf is setup to look for).

Final steps

We are just about done at this point – everything is configured, and DRBD is setup on each instance. We now need to actually create the meta-data disk for our specific resource (run on both nodes):

drbdadm create-md drbd_res0

Finally, we start DRBD (on both nodes):

service drbd start

We can find the status of our nodes, either by using service drbd status, drbd-overview, or cat /proc/drbd:

At this point, we have not actually defined which node is to be the primary node – both are therefore classed as secondary, something we will resolve momentarily.

Up until this point, all steps have been done on both instances. Without a dual-primary/network file system setup, the DRBD files will only be accessible to one instance at a time. The primary node will be able to read and write to the volume, but the secondary node will not. In a failover scenario, we would promote the secondary node to primary, and it will then have full access to the volume.

We must now promote one node to primary. It is important to note that you cannot promote a node to primary if the nodes are inconsistent (see the status above). To do so, initially, you will need to use the --overwrite-data-of-peer option. Be careful, as this option will completely overwrite the data on the other node:

drbdadm -- --overwrite-data-of-peer primary drbd_res0

If the nodes are UpToDate, you can use:

drbdadm -- primary drbd_res0

Checking the status of our nodes, will now reveal, that one is primary, and if necessary, a sync may be in progress:

To be able to simultaneously access the data on both nodes, we need to setup both nodes as primary, and use a network file system – such as OCFS2 or GFS2 (instead of XFS), in order to minimize the risk of inconsistencies. That, however, is an experiment for a future date. (Of course, there are other alternatives to DRBD – my personal preference being GlusterFS on EC2, which, while having a bit of additional overhead, is simpler to setup and has quite a few more features).