Search This Blog

Monday, November 14, 2011

RHEL 6 Part IV: Placing XFS into production and measuring performance

Making an XFS filesystem for a production environment

It's about time we see some actions apart from the first sysadmin impressions on RHEL 6, as described in the previous article of the series. One of the first fundamental differences between RHEL 5 and 6 is the support for XFS filesystem deployments. Why would you care to support XFS? Well simply, apart from the multi-threaded performance, if you are an ext4 kind of guy and you are likely to store more than 16 TiB on a volume, then XFS is your best choice (actually ext4 can support filesystems up to 1 EiB, however the accompanying filesystem utilities and the support on these utilities limit the supported size of a volume down to 16TiB).

Another kind of 'gotcha' (which I really dislike with RedHat) is that in RHEL 6, you should not take support of XFS for granted, unless your license includes the duties paid for the appropriate layered product, which is called "Scalable File System Add-On" (my own translation: "Give us your money if you want fs support over 16 TiB" :-) ). If you have paid for a basic RHEL 6 license, your RHN registered your machine, mkfs.xfs is missing from your root path and a yum search xfsprogs returns nothing, you know that you need to look into your pocket and not the yum repository config.

If you do not want to spend money and willing to risk running an XFS installation without support, head over to the nearest CentOS 6 repository, download the xfsprogs and xfsprogs-devel RPMs, do a yum install with these two RPMs and you will be good to go.

I used a simple Directly Attached Storage setup of a Dell PowerEdge R815 server, fitted with an H800 PERC SAS 6Gb controller driving a single Dell MD1200 cabinet fitted with 12 x 2Tb Nearline 6Gb SAS drives. Four of them were used for the purposes of the test in RAID0 config. In order to be precise, for those of you familiar with the OMSA setup, here is the exact config as reported by the omreport storage vdisk OMSA command:

At that point, I have tagged the hardware created vdisk (/dev/sdd) as an LVM physical volume, created my Volume Group and made a Logical Volume of 5 Tbytes, in order to build my XFS filesystem (I am not going to use the full size of the PV, in order to demonstrate XFS expansion later on). Now, let's build the actual XFS filesystem:

This actually builts the XFS filesystem on top of the LVM Logical Volume (/dev/VGEMBGalaxy/LVembgalaxy). You might have noticed that the specified stripe unit (su) size and the number of disks (sw) match the config of the H800 vdisk, as given earlier on by the output of the omreport storage vdisk command. Good system practice dictates that these parameters are passed to the mkfs.xfs utility, in order to improve filesystem performance.

We are now ready to mount the filesystem, so we make sure the mountpoint exists and enter an entry to the /etc/fstab:

Note the nobarrier and inode64 flags. The first (which is also applicable to ext4 filesystems) makes sure that you get a bit of extra performance boost, if and only if your disk controller cache memory is battery backed (and the battery is good AND you have a UPS to shutdown your system properly). The same objective is set by using the inode64 flag, although it can break some older applications (old NFS v3 clients that NFS import the XFS partition, applications whose binaries are older than 4-5 years and write locally on the disk). A mount -a later and you should be able to see the XFS filesystem accessible:

Now, let's say that all is good, you go ahead and use the filesystem and after some time your users fill up the volume. How about expanding the volume and add say a couple of TiBs, to give them some breathing space? Sure, quite easily, without even taking off-line (unmounting the filesystem). First, we extend the LV:

I employ iozone, a well tested filesystem benchmarking tool on an ext4 volume and then on the newly constructed XFS volume. Both volumes are configured with exactly the same RAID config (RAID 0 and 4 disks), they run on the same type of hardware, they have the same fs block size (4kbytes).

The mount flags for the ext4 filesystem were:

rw,noatime,nobarrier,data=writeback

and for the XFS filesystem:

rw,nobarrier,inode64

The benchmarks are run in the following order:

first the ext4 benchmark is run

a reboot of the box follows to make sure we do not have any VFS cache/memory issues affecting the results

the XFS volume benchmark is run.

During both tests all other I/O activity is excluded on the box ( no users login and services are kept to a minimum. You might also find useful to disable SELinux. There is always the option of running the benchmarks in single user mode, but I wanted to monitor the box remotely, as I was writing this ).

This pro the entire procedure is repeated five times and the arithmetic mean of the results is reported on the graph results.

The iozone manual will help you decipher the meaning of the switch options, but briefly, the command encompasses some parameters that ensure we get meaningful results, given the size of RAM of the server, the processor cache size and the test conditions. The volume_file_path is the absolute path of the volume where the test file should reside (the volume/partition you should test).

Please note that these tests take weeks to complete properly, so should you wish to perform similar tests on a system, make sure you schedule enough downtime to complete them without additional activity on the box.

Here are the results.

These should make the difference clear, showing in summary that as far as sequential I/O performance is concerned, XFS is better. For random I/O performance (smaller figures on the right, we have also better speed for random writes on XFS.

Want a scalable solution that can give you a descent performance and have been so far on ext4, while your single volume data production rises? Think again and consider XFS!

3)No, I did not change the default queue scheduler. The idea was to test a default config of RHEL5 versus a default config of RHEL6. Obviously, by tunning the schedulers you could improve the results for certain workloads.