An Appliance Built Exclusively for Galaxy

By Allison Proffitt

July 1, 2013 | Galaxy, the well-loved open source tool for data-intensive biomedical research, is getting some new gear. At the Galaxy User Group meeting in Oslo, Norway, BioTeam announced a new hardware appliance specifically for Galaxy: the SlipStream Appliance: Galaxy Edition. According to the agreement with the Galaxy Project, for the next two years BioTeam will be the exclusive appliance vendor for Galaxy.

SlipStream Galaxy is a hardware appliance consists of 16 Intel cores, 100 GB of solid state drive, 384 GB of memory, and 16 TB of usable storage space. Galaxy is pre-installed and configured. The appliance sells for just under $20,000.

“Galaxy is probably the most successful open source project in the NGS space,” Stan Gloss, Founding Partner and CEO, BioTeam, told Bio-IT World.

Galaxy describes itself as an open, web-based platform for accessible, reproducible, and transparent computational biomedical research. According to the Galaxy Wiki, the platform strives to be:

Accessible: Users without programming experience can easily specify parameters and run tools and workflows.

Reproducible: Galaxy captures information so that any user can repeat and understand a complete computational analysis.

Users can access Galaxy in one of three ways, explains James Taylor, of Emory University and one of the co-founders of the Galaxy project. Galaxy is hosted as a service at Penn State and users number about 35,000, Taylor said. “The number of analyses continues to hover around 150,000 jobs per month, and that’s because of resource constraints.” At the Penn State instance, access is limited to 250 gb of data and 4 concurrent jobs. “Frankly, it’s always going to be a challenge to keep up with community demand,” Taylor says.

As an alternative, users can deploy the software fairly easily on the Amazon cloud, Taylor says. It’s a good option, but there are data transfer challenges, costs associated with data storage on the cloud, and—for some users—privacy concerns.

The final option is local instances—Galaxy is downloaded and installed locally. It’s a “really good option for lots of use cases,” Taylor says. “It is the same Galaxy, but you keep [on site] all the tools, all the data necessary to actually do your analysis. That’s where this appliance model is very attractive.”

But installing Galaxy locally can be challenging says Gloss. Installation can take 2-3 weeks, “with some knowledge,” he says. “That’s a lot of work.”

That’s where BioTeam’s expertise comes in, Gloss explains. BioTeam has over 400 clients in the NGS space, and has installed Galaxy for many. The hardware appliance was born out of those experiences. “Everything we do comes out of consulting work… Our innovation comes out of multiple consulting engagements doing the same things over and over,” Gloss said. “We’re trying to consolidate those best practices.”

The goal was to pack a lot of power into a small, easy to use package, Gloss says.

The appliance looks like a computer tower, with a small footprint, but the nodes are server-class, “What you’d find in a data center,” he says. The appliance has a lot of memory, so that computations can be run very efficiently. “We were shooting for something simple, fast, and efficient,” Gloss says.

In a whitepaper released with the announcement, BioTeam reports performance benchmarks for SlipStream Galaxy. For example, the appliance did whole-genome mapping with Bowtie 2 in 2:44; it finished RNA-seq mapping with TopHat 2 in 1:24.

Test Drive

Tools

Task

Data

Runtime

Bowtie 2

mapping whole

human genome

204 million paired-end

100bp Illumina reads

2 hours

44 minutes

SAMTools

SAM-BAM conversion

127GB SAM

(41GB resulting BAM)

2 hours

7 minutes

TopHat 2

RNA-Seq mapping

24 million 100bp

Illumina reads

1 hour

24 minutes

Cufflinks 2

Differential Expression

Analysis

4.3 GB SAM File

11 minutes

Edward DeLong, Professor of Biological Engineering at MIT, has been using the SlipStream Galaxy since February in tandem with an Illumina MiSeq instrument. “For us it’s been really enabling,” DeLong told Bio-IT World.

“One of the ways that we’ve been using the SlipStream has been really as a handshake between that machine and the sequencer,” he said. “Having that handshake between the data management side of the house and the data production from the sequencers is super important.”

DeLong is particularly pleased that everything is one place—from raw data storage to the analytical tools, in particular Galaxy. “It’s all one platform. In the past we might have used three or four different computer systems, each with a different operating system and different kinds of management challenges associated with it.” In fact, DeLong says the SlipStream’s large memory enables it to easily handle jobs that his team used to send to a computer cluster.

DeLong considers his lab “small time users”; they are generating no more than 100 billion Illumina-sized sequences a week. But the SlipStream appliance is handling their workload plus some.

“We do a lot of collaboration, so we’re getting data from the larger Illumina machines and larger datasets, which isn’t coming from our machine, it’s coming from the outside world, but we’re ingesting it into the appliance.”

The Right Users

The appliance option has been, “the missing piece for years,” Taylor says. “We’ve seen the need, but we’ve just reached a point where there’s a critical mass of users… [Now] is the right time to do it.”

Gloss sees non-commercial, small academic labs like DeLong’s being the best fits for the appliance. Users that value ease of use, speed of installation, and “don’t want to spend all their time building,” Gloss said.

The BioTeam is tracking the proliferation of bench top/desk top sequencers, he says. By 2015, Gloss predicts that 80% of sequencers sold will be small systems going into individual labs, hospitals, and clinics. But those environments are not likely to have access to the large technical support staffs or dedicated servers. “Desk top sequencers need desk top systems next to them,” Gloss contends.

SlipStream Galaxy is for those users, Gloss said. For users who prefer to be on the cloud, Gloss sees the appliance as a “gateway.” “You can test or build a workflow on the appliance, perfect it, then launch it to the cloud,” he says.

The BioTeam is selling systems now to early access users. “We’ll work with them to perfect the processes, installation, and implementation, shaping what the general release product will be,” Gloss said. He expects SlipStream Galaxy to be more widely-available later in the year.

Open and Community Driven

The SlipStream appliance makes no changes to the Galaxy software. “We’re not touching or modifying anything of the operating system program at all, Gloss said. The appliance offers optional value for users who would like dedicated hardware, and would like to avoid installation.

Taylor stresses that there continues to be an emphasis on open access and community-building that has been foundational to Galaxy from the start.

“With Galaxy we’ve really been trying to build something that’s completely open. All the infrastructure to support everything, the tools that go into Galaxy Tool Shed—all that stuff is open. We’re trying to facilitate an open community,” says Taylor.

SlipStream Galaxy is embracing the open philosophy, Taylor says. The appliance offers commercial-grade support, but users aren’t locked in, and are free to choose alternative deployments. “And BioTeam will remain active in the community, participating in the community,” he continues. “So I think this is really different from a lot of other more commercial solutions where you’re providing these analyses, but you don’t get the benefit of the open platform and the community that comes with something like Galaxy.”

One of the challenges of that type of open community is the speed with which Galaxy changes. “The development version of Galaxy is being worked on on a daily basis, and we’re doing releases about five times a year now of a more stable, tested version of Galaxy,” Taylor says—a slowdown from earlier release schedules. Taylor’s lab is currently working on visualization and visual analytics and refining the Galaxy Tool Shed, where users can develop and contribute tools to the community, and others users can download those. “We are always adding new things to Galaxy.”

That dynamic environment could present a challenge for users who buy the appliance, but Gloss and his team have provided for automated updates to SlipStream Galaxy, so users always have access to the latest version.

“We’re using our own experience and using a program called Opscode Chef [for the updates],” Gloss says. “Every system will be built on its own ‘recipe’ and we’ll have the ability to provide seamless updates.” Using Chef will also provide a recovery option in case there is ever a problem. “We’ll use a custom recipe for each system,” Gloss said, “So if there’s a problem, it will be easy to help reimage the system as needed.”