My experience with GlusterFS performance.

I have been using GlusterFS to replicate storage between two physical servers for two reasons; load balancing and data redundancy. I use this on top of a ZFS storage array as described in this post and the two technologies combined provide a fast and very redundant storage mechanism. At the ZFS layer, or other filesystem technology that you may use, there are several functions that we can leverage to provide fast performance. For ZFS specifically, we can add SSD disks for caching, and tweak memory settings to provide the most throughput possible on any given system. With GlusterFS we also have several ways to improve performance but before we look into those, we need to be sure that is it the GlusterFS layer which is causing the problem. For example, if your disks or network is slow, what chance does GlusterFS have of giving you good performance? You also need to understand how the individual components work under the load of your expected environment. The disks may work perfectly well when you use dd to create a huge file, but what about when lots of users create lots of files all at the same time? You can break down performance into three key areas:

Networking – the network between each GlusterFS instance.

Filesystem IO performance – the file system local to each GlusterFS instance.

GlusterFS – the actual GlusterFS process.

Networking Performance

Before testing the disk and file system, it’s a good idea to make sure that the network connection between the GlusterFS nodes is performing as you would expect. Test the network bandwidth between all GlusterFS boxes using Iperf. See the Iperf blog post for more information on benchmarking network performance. Remember to test the performance over a period of several hours to minimise the affect of host and network load. If you make any network changes, remember to test between each change to make sure it has had the desired effect.

Filesystem IO Performance

Once you have tested the network between all GlusterFS boxes, you should test the local disk speed on each machine. There are several ways to do this, but I find it’s best to keep it simple and use one of two options; DD or bonnie++. You must be sure to turn off any GlusterFS replication as it is just the disks and filesystem which we are trying to test here. Bonnie++ is a freely available IO benchmarking tool. DD is a linux command line tool which can replicate data streams and copy files. See this blog post for information on benchmarking the files system.

Technology, Tuning and GlusterFS

Once we have made it certain in our minds that disk I/O and network bandwidth are not the issue, or more importantly understood what constraints they give you in your environment, you can tune everything else to maximise performance. In our case, we are trying to maximise GlusterFS replication performance over two nodes.

We can aim to achieve replication speeds nearing the speed of the the slowest performing speed; file system IO and network speeds.

I have been using GlusterFS to provide file synchronisation over two networked servers. As soon as the first file was replicated between the two nodes I wanted to understand the time it took for the file to be available on the second node. I’ll call this replication latency.

As discussed in my other blog posts, it is important to understand what the limitations are in the system without the GlusterFS layer. File system and network speed need to be understood so that we are not blaming high replication latency on GlusterFS when it’s slow because of other factors.

The next thing to note is that replication latency is affected by the type of file you are transferring between nodes. Many small files will result in lower transfer speeds, whereas very large files will reach the highest speeds. This is because there is a large overhead with each file replicated with GlusterFS meaning the larger the file the more the overhead is reduced when compared to transferring the actual file.

With all performance tuning, there are no magic values for these which work on all systems. The defaults in GlusterFS are configured at install time to provide best performance over mixed workloads. To squeeze performance out of GlusterFS, use an understanding of the below parameters and how them may be used in your setup.

After making a change, be sure to restart all GlusterFS processes and begin benchmarking the new values.

GlusterFS specific

GlusterFS volumes can be configured with multiple settings. These can be set on a volume using the below command substituting [VOLUME] for the volume to alter, [OPTION] for the parameter name and [PARAMETER] for the parameter value.

1

gluster volume set[VOLUME][OPTION][PARAMETER]

Example:

1

gluster volume set myvolume performance.cache-size1GB

Or you can add the parameter to the glusterfs.vol config file.

1

vi/etc/glusterfs/glusterfs.vol

performance.write-behind-window-size – the size in bytes to use for the per file write behind buffer. Default: 1MB.

performance.cache-refresh-timeout – the time in seconds a cached data file will be kept until data revalidation occurs. Default: 1 second.

performance.cache-size – the size in bytes to use for the read cache. Default: 32MB.

cluster.stripe-block-size – the size in bytes of the unit that will be read from or written to on the GlusterFS volume. Smaller values are better for smaller files and larger sizes for larger files. Default: 128KB.

Other Notes

When mounting your storage for the GlusterFS later, make sure it is configured for the type of workload you have.

When mounting your GlusterFS storage from a remote server to your local server, be sure to dissable direct-io as this will enable the kernel read ahead and file system cache. This will be sensible for most workloads where caching of files is beneficial.

When mounting the GlusterFS volume over NFS use noatime and nodiratime to remove the timestamps over NFS.

I haven’t been working with GlusterFS for long so I would be very interested in your thoughts on performance. Please leave a comment below.