Open source HPC file system gets startup

High performance computing – by which is meant traditional parallel supercomputing as well as data analytics and hyperscale cloudy infrastructure – is facing a looming file system and storage bottleneck, and Whamcloud, a startup backed by $10m in private funding and some of the top people behind the Lustre file system, want to help.

And get a piece of the exascale action, of course.

Whamcloud is not saying much about its plans for the Lustre file system, and that is in part because Lustre is an open source project controlled by Oracle.

The Lustre parallel file system, which runs across clusters of storage servers to provide the high data bandwidth required by supercomputers, was developed through funding from the US Department of Energy in parallel (joke intended) with the ASCI supercomputing program that propped up the finances of IBM, SGI, and Intel in the 1990s.

The Lustre project was founded by Peter Braam when he was a researcher at Carnegie Mellon University. In 2001, he founded Cluster File Systems to provide commercial support for and fund further development of the parallel file system.

In September 2007, Sun Microsystems bought CFS to beef up its HPC business and to marry some of its Zettabyte File System (ZFS) capabilities to Lustre. Oracle, of course, bought Sun this past January, and now controls the open source Lustre project - inasmuch as anyone can control an open source project.

About half of the computers in the Top 500 rankings use Lustre, and the only practical alternative for big parallel clusters is IBM's Global Parallel File System, which is proprietary and not particularly cheap.

Brent Gorda, who was cutting the checks for the development of Luster over at DOE many years ago, is Whamcloud's chief executive officer. Eric Barton, who is Whamcloud's chief technology officer, worked at Lawrence Livermore National Laboratory, where Lustre eventually cut its teeth on nuke data, and eventually became a principal engineer for Lustre at Sun after the CFS acquisition.

Robert Read, who was in charge of the Lustre 2.0 project at Oracle and was the lead engineer at Sun after Barton, is the principal engineer at Whamcloud for its work on Lustre.

While Whamcloud is being vague about what its plans are, it is being upfront about what it will not do.

"We want to make it clear that we do not intend to fork the source code for Lustre," says Gorda emphatically. "We want to continue development on the Lustre code base with specific attention to high performance computing, and we will not allow competing interests for Lustre to pull it in other directions. We want to do development and get it back into Lustre."

Oracle is the gatekeeper for Lustre, but has not said much about its plans for the parallel file system. Oracle is doing a lot of work with Btrfs, a scalable B-Tree file system for Linux that is in the experimental stages now but included with the current SUSE Linux Enterprise 11 SP1 and the upcoming Red Hat Enterprise Linux 6. Oracle's plans for traditional HPC are unclear, and as such, so are its plans for Lustre.

But Gorda says Oracle is talking to Lustre customers and is attending Lustre user groups – more than it is doing with the OpenSolaris community at the moment.

Whamcloud's intention is to keep pushing Lustre so it can deal with multiple petaflops and on up to exaflops supercomputers. With disk drives not getting any faster (at least relative to CPU clocks and counts) and flash drives being too expensive, the only answer, says Gorda, is "to go wildly parallel" with storage systems.

The problem with Lustre is that it is a niche product, unlike Linux, and therefore does not lend itself to the kind of broad and inexpensive support model that Red Hat and Novell can offer. That is why Whamcloud is stepping up to the plate.

Gorda says Whamcloud can offer feature extensions to Lustre – for cloud computing, hence the company's name – that are not in conflict with Oracle's plans for Lustre and its support revenue stream and generate some revenues there. As far as the core Lustre product goes, any code created by Whamcloud relating to the traditional HPC market will remain open source.

The company is not going to try to create an "open core" variant of Lustre, with Oracle and Whamcloud working together to code Lustre, but Whamcloud selling closed-source feature extensions separately under subscription pricing to try to mask the fact that they are really closed source licenses.

That said, anything Whamcloud creates for Lustre of HPC is up for grabs – and up in the air until the company works out its relationship with Oracle and gins up its own product plans. ®