SVLUG: The Story of Gluster

The name “Gluster” was derived from the words “GNU” and “Cluster.” No relation to the Luster filesystem, and actually they have opposite overall architectures.

GlusterFS is a GPL3 distributed network filesystem that runs as a service in user mode on Linux on a network of servers (conceptually like Google GFS.) Redhat bought Gluster, Inc. in 2011 and calls it “Redhat Storage.” By running in user mode and reusing existing linux features and modules, GlusterFS gained reliability in months instead of the usual 10 years for other filesystems.

– after that, he still wanted to do Open Source projects, preferably without bureaucratic encumbrances
– got some angel funding, but also a seismic data company paid $500,000 to adapt HPC code in 3 months to replace IBM Regatta system, then a follow-on storage contract for 1.2 PB in a 6 months project
– audience member: “In 10 years 1 PB will fit on a SD card.”
– GlusterFS is in some ways architecturally the opposite of VMware. GlusterFS is userland code.
– in 2006 large companies like Lehman, etc. started appearing on mailing list, to some surprise and awe
– originally the company was called Z Research, renamed to Gluster, Inc. to have clearer brand name
– no in-house test storage hardware, so developed on customer hardware!
– was still doing other paying work such as embedded kernel stuff, web dev, etc to reduce burn rate initially. Too distracting.

Traditional complex method

Newer, simpler method

FC

HTTP, sockets

modified BSD OS

linux/user space/C, python, Java

appliance-based

application-based

– Google mixes app and GFS, app generates 64 MB chunks, GFS manages metadata. Too complicated for general use.
– GlusterFS is a distributed storage OS in user-space
– create container without knowledge of filesystem (Posix, ACL, etc. ) because no known common user pattern for storage
– lots of general C programmers available to recruit, but no filesystem developers without baggage about kernel
– in 2008-2009 added too many features to actually test
– VCs contacted them and invested A and B series total of $15 million, despite their storage “experts” saying it was crazy, users said it was awesome
– lowest layer is native filesystem like EXT3 or XFS, thus idiot-proof
– use extended attributes for metadata
– block, replication, striping, elastic hashing algorithm
– striping support by cleverly using sparse files with different offset on each server
– read server choice based on fastest response
– every directory has its own hash space
– good default behavior when adding servers (no thundering herd)
– striping is good for hotspot files or files too big for 1 volume, like saving HPC results
– will be unified file and object protocol for object storage
– there is a pathinfo command can query extended attribute, could be used with ssh for a fake MapReduce
– GET and PUT at command line
– GlusterFS is most heavily used for lots of files containing unstructured data
– 3.3 will have faster healing operations, better granularity for 100 VMs, KVM support, etc.
– currently shared-nothing, but with a little sharing healing can be faster
– will be HDFS clone mode

Data Storage models:

objects

file

block

structured data

NoSQL

semi-structured data.

– Redhat bought Gluster, Inc. for about $136 million in October, 2011. It was about 60 employees at the time. Now there’s about 40 engineers working on GlusterFS at Redhat. AB chose Redhat over other suitors because of its commitment to Open Source and linux.

A dozen people attended dinner afterwards:

– it was tough hiring people for Gluster Inc. since the concept of doing file systems in userland confused a lot of developers and managers.
– AB’s philosophy is that the Open Core model doesn’t serve end-users well, as all users need “extras” like user-friendly mgmt. programs, not just licensees
– companies seem to be happy to pay for GlusterFS support, one of the reasons being lack of in-house storage engineers
– lots of discussion about Illumos (OpenSolaris fork), ZFS and containers
– take a look at Nexenta
– An efficient WAN replication method with GlusterFS is to use the marker framework / queue using extended attributes to feed rsync a list of changed files, scales better than inotify
– no storage tiering yet for incoming/outgoing hotspot files
– Redhat is pushing xfs heavily internally, and has hired the available ex-SGI xfs developers
– some checksumming is done in GlusterFS, but no end-end checksumming. Need to look at performance and demand.
– historicaly, linux has had slower adoption and community interest in India than Western countries due to lagging Internet performance (often more practical to install linux from a magazine CD-ROM than attempting large downloads) and relatively higher cost of computers compared to local salaries
– AB got started in programming on a Spectrum microcomputer, and progressed over time to fixing minor bugs in the linux network drivers, culminating in GlusterFS.