Red Hat clustered storage goes beta

The first iteration of the Gluster clustered file system that is going through the Red Hat annealing process is coming closer to market with the launch of a the first beta of the tool since Shadowman acquired Gluster last October.

There was nothing wrong with having a product called GlusterFS – that name suggested it was part grid and part cluster, with a GNU license and a file system – but for whatever reason, Shadowman is calling it Red Hat Storage 2.0, which really doesn't tell you anything about the product at all.

Gluster was a spinoff of California Digital Corp, a niche supercomputer maker that built a big cluster in 2003 for Lawrence Livermore National Laboratory out of Itanium nodes, called "Thunder" and built from Itanium processors and InfiniBand networking.

Two years later, Anand Babu Periasamy took a bunch of parallel file system experts with him to found Gluster, hoping to create a better alternative to the open source Lustre file system and IBM's closed source General Parallel File System. Gluster shipped its first prototype in 2007 and put out release 2.0 of GlusterFS in 2009.

The secret sauce in GlusterFS is what it calls an elastic hashing algorithm, which is an alternative to having a single, chokepoint metadata server at the heart of the file system as many clustered file systems do. GlusterFS 2.0 could scale to more than 500 x86 server nodes implementing the file system and spanning petabytes of capacity.

The file system can ride atop of ext3, ext4, XFS, and other file systems on each server node, and exposes the file system as a global namespace that spans the storage server nodes as an NFS or CIFS mount point.

Gluster runs on x86 servers on top of Linux and can talk to SATA and SAS disks and RAID controllers (if you have them or want belts with your replication suspenders).

Sarangan Rangachari, general manager of storage at Red Hat, tells El Reg that GlusterFS does not yet understand tiering on non-volatile storage, but that Red Hat's engineers are looking at how this might be used to goose the I/O performance of GlusterFS.

Rangachari also wants to remind everyone that GlusterFS has a native access client, called Fuse, which has some performance benefits compared to the NFS and CIFS mount points, and that Red Hat is now trying to quantify what these performance benefits are as it hardens the code for Red Hat Storage 2.0.

At the moment, Rangachari says, the Fuse client can deliver about twice the oomph serving up files compared to NFS and CIFS – but of course the benefit of NFS and CIFS is that Windows, Linux, Unix, and other operating systems speak these. They have to be taught to speak Fuse.

Since acquiring Gluster for $136m in cash last fall, Red Hat open sourced all of the GlusterFS code, which was developed under an open-core philosophy until then, at the Gluster.org community. There are over 2,000 members in the GlusterFS community, and they put together a GlusterFS 3.3 upstream release that Shadowman grabbed to create Red Hat Storage 2.0, the future commercially supported release. (The current stable upstream release is GlusterFS 3.2.5.)

With the Red Hat Storage 2.0 beta (which is a private beta at the moment, by the way), a number of different features are being rolled out. You can deploy it bare-metal on any Linux server running RHEL 6.X and using any file system you please – this was rolled out last December and has been on sale in North America since that time – or you can grab a software appliance that runs atop the company's Enterprise Linux 6.2 distro and the XFS underlying file system on the server nodes.

In addition, as El Regtold you about in February, there is also a virtual software appliance that lets the GlusterFS clustered file system run across virtual machine nodes on Amazon's EC2 compute cloud and virtual storage on its Elastic Block Storage (EBS) to create a scale-out NAS.

With the 2.0 release of the commercial-grade GlusterFS, Red Hat is adding a Hadoop connector that will allow for MapReduce algorithms written for the Apache Hadoop data muncher to run across data stored on GlusterFS.

The beta also includes a new feature called unified file and object storage, which means you can save something as a file and another application or user can retrieve it as an object (as you would from a storage cloud such as Amazon's S3 service), or store it as an object and have it retrieved as a file (as you would from NFS or CIFS). "The beauty of this thing is that it is completely hidden from the users and applications," says Rangachari.

Red Hat Storage 2.0 has a number of new performance features, including faster rebalancing and tuning for NFSv3. The file system has been tweaked to be the underlying storage layer for compute clouds based on Red Hat Enterprise Virtualization, the commercial-grade KVM hypervisor that Red Hat peddles. The 2.0 beta release also has a number of security and self-healing enhancements.

Shadowman is not saying when (or if) Red Hat Storage 2.0 will go into public beta or when it is coming to market. Both depend on the feedback that the company gets during the private beta testing. Depending on that feedback, says Rangachari, there could be just one beta or multiple betas.

What Rangachari can commit to is for Storage 2.0 to expand out from its North American sales into Europe, and other geographies after that, once it is ready for prime time. ®