Thursday, April 19, 2012

This tutorial shows how to combine four single storage servers (running Ubuntu 11.10) to one large storage server (distributed storage) with GlusterFS. The client system (Ubuntu 11.10 as well) will be able to access the storage as if it was a local filesystem. GlusterFS is a clustered file-system capable of scaling to several peta-bytes. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system. Storage bricks can be made of any commodity hardware such as x86_64 servers with SATA-II RAID and Infiniband HBA.
Please note that this kind of storage (distributed storage) doesn't provide any high-availability features, as would be the case with replicated storage.
I do not issue any guarantee that this will work for you!

1 Preliminary Note

In this tutorial I use five systems, four servers and a client:

server1.example.com: IP address 192.168.0.100 (server)

server2.example.com: IP address 192.168.0.101 (server)

server3.example.com: IP address 192.168.0.102 (server)

server4.example.com: IP address 192.168.0.103 (server)

client1.example.com: IP address 192.168.0.104 (client)

Because we will run all the steps from this tutorial with root privileges, we can either prepend all commands in this tutorial with the string sudo, or we become root right now by typing

sudo su

All five systems should be able to resolve the other systems' hostnames. If this cannot be done through DNS, you should edit the /etc/hosts file so that it looks as follows on all five systems:

2 Setting Up The GlusterFS Servers

GlusterFS is available as a package for Ubuntu 11.10, therefore we can install it as follows:

apt-get install glusterfs-server

The command

glusterfsd --version

should now show the GlusterFS version that you've just installed (3.2.1 in this case):

root@server1:~# glusterfsd --version
glusterfs 3.2.1 built on Jun 28 2011 07:43:56
Repository revision: v3.2.1
Copyright (c) 2006-2010 Gluster Inc.
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU Affero General Public License.
root@server1:~#

If you use a firewall, ensure that TCP ports 111, 24007, 24008, 24009-(24009 + number of bricks across all volumes) are open on server1.example.com, server2.example.com, server3.example.com, and server4.example.com.
Next we must add server2.example.com, server3.example.com, and server4.example.com to the trusted storage pool (please note that I'm running all GlusterFS configuration commands from server1.example.com, but you can as well run them from server2.example.com or server3.example.com or server4.example.com because the configuration is repliacted between the GlusterFS nodes - just make sure you use the correct hostnames or IP addresses):

Next we create the distributed share named testvol on server1.example.com, server2.example.com, server3.example.com, and server4.example.com in the /data directory (this will be created if it doesn't exist):

By default, all clients can connect to the volume. If you want to grant access to client1.example.com (= 192.168.0.104) only, run:

gluster volume set testvol auth.allow 192.168.0.104

Please note that it is possible to use wildcards for the IP addresses (like 192.168.*) and that you can specify multiple IP addresses separated by comma (e.g. 192.168.0.104,192.168.0.105).
The volume info should now show the updated status:

Instead of mounting the GlusterFS share manually on the client, you could modify /etc/fstab so that the share gets mounted automatically when the client boots.
Open /etc/fstab and append the following line:

Now let's check the /data directory on server1.example.com, server2.example.com, server3.example.com, and server4.example.com. You will notice that each storage node holds only a part of the files/directories that make up the GlusterFS share on the client: