Provide Robust Clustered Storage with Linux and GFS

Load balancing is difficult; often we need to share file systems via NFS or other mechanisms to provide a central location for the data. While you may be protected against a Web server node failure, you are still sharing fate with the central storage node. Using GFS, the free clustered file system in Linux, you can create a truly robust cluster that does not depend on other servers. In this article, we show you how to properly configure GFS.

Conceptually, a clustered file system allows multiple operating systems to mount the same file system, and write to it at the same time. There are many clustered file systems available including Sun's Lustre, OCFS from Oracle, and GFS for Linux.

There are a few methods by which a block device can be made available on multiple servers at once. You can zone a SANLUN to be visible to multiple servers, configure iSCSI to do the same, or use DRBD to replicate a partition between two servers. With DRBD, you will need to configure it in Primary/Primary mode to use GFS, but otherwise the configuration is the same we wrote about a few weeks ago.

GFS Requirements

Running GFS means you are running a cluster. By far, the easiest way to accomplish this is by using the Red Hat Cluster Suite (RHCS), available in CentOS 5. The following packages are required: cman, the cluster manager; lvm2-cluster, the CLVM package to enable cluster support in LVM; kmod-gfs, the GFS kernel module; and finally gfs-utils.

The cluster manager (cman) takes care of necessities like the distributed lock manager and fencing. Using CentOS or RHEL is highly recommended unless you want to spend time figuring out how the various distros broke the cman package when they adopted it (they always do). Also, you will get the most recent release of the various cluster services RH maintains, along with a predictable environment.

Fencing is absolutely required. Some how-to articles recommend setting the fence mode to "manual" because it can be complex to configure. Fencing means partitioning a cluster, or immediately powering off dangerous nodes. You will have a corrupt GFS if the cluster is unable to fence a misbehaving node, so do not skip this step.

Creating the Cluster Configuration

The cluster is largely configured via cluster.conf in /etc/cluster/. I do not recommend using the various cluster management applications that generate this configuration file. Even fully supported RHEL applications like Conga, as recent as two months ago, often generate cluster.conf files that are invalid and cannot be parsed by the necessary services.

The configuration file, in beautiful XML, is straightforward. First we start with naming the cluster; we're calling this one "web1."

Skipping the fence daemon options for a moment, the next section is the meat of your cluster definition. You need to define two nodes in the clusternodes section. This configuration file will live on both nodes, so that they all know about each other.

Each node in the cluster mentions that its fence method name is unique to itself. Below the clusternames closing tag, we find fencedevice sections that tell each node how to power off the other. Using a server that supports IPMI is the best way, and the configuration is quite simple. You tell it what IP the IPMI is on, and how to login. To avoid putting the password in the cluster.conf, you can refer it to a root-owned script that echoes the password.

Also notice that we specified two_node in the configuration. This is required, because generally a cluster will not be 'quorate' unless the majority of nodes agree on its state. You cannot have a majority vote with two participants, so this option lets the cluster function with only two nodes. This is all that is required to configure a basic cluster.

Run 'service cman start' on each node, and everything should start properly. You can check 'clustat' or 'cman nodes' to verify the health of the cluster. If one of the required daemons did not start, the cluster will not show "Quorate."

GFS Configuration

First, we need to configure CLVM so that we can use LVM with our GFS. Enabling CLVM is as easy as setting 'locking_type=3" in lvm.conf.

Next, create an LVM volume group and volume as you normally would, but use the shared block device. If you're using DRBD, you will likely use /dev/drbd0. I created a physical volume, then a volume group called vg01, then a logical volume on top called web1, giving: /dev/vg01/web1.

Finally, we need to create the file system:

gfs_mkfs -t web1:mygfs -p lock_dlm -j 2 /dev/vg01/web1

The name given in -t must be the name of the cluster, followed by what you want to call this file system. Only members of the web1 cluster will be allowed to mount this file system. Next, set the lock type to the distributed lock manager, specify that you want two journals (since this is a two-node cluster), and finally the volume to create it on. If you expect to add more nodes in the future, you must set the number of journals sufficiently high at this point.

Wrapping Up

We are ready to start using the file system now. Start the 'clvmd' and 'gfs' services on both nodes. Now you can mount the file system with '-t gfs' to specify the type as GFS. That is it!

Be sure to set the cman, clvmd, and gfs services to start on boot, and don't forget about the file system either. Get familiar with the clustat and gfs_tool commands; these are the first places to look when something goes awry.

Do not expect GFS to be lightning fast. It is normal to experience pauses when accessing the file system if one node is performing a lot of writes. For a Web cluster, where the data is read far more often than data is written, this is rarely an issue. If large delays occur, first check the health of all components, and then evaluate what is being written. The most common cure for slowness issues is to ensure HTTP session data is not written to the GFS volume.

Charlie Schluting is the author of Network Ninja, a must-read for every network engineer.