Bioinformatics tools: Stacks

Sep 25, 2012

It is the first post about bioinformatics tools, but I will try to post more about other tools such
as MrBayes, Structure, maybe some next generation sequencing tools too, and Bioperl, Biojava, and so on.

As I am more a computer geek, rather than a bioinformatics one, I will focus on requirements for running these tools
on clusters and the requirements to install them on your machine. The instructions require that you have an
intermediary knowledge on *nix OS and sometimes a bit of programming experience.</em>

I will be using tutorials available on the Internet and hosting my code in GitHub/kinow. Hammer time!

Stacks is hosted at Oregon University, so if you are googling about it right now, probably a query including both ‘stacks’ and ‘oregon’ may give you better results (it’s good to know when you forget the web address). This is the stacks homepage: http://creskolab.uoregon.edu/stacks/.

"Stacks is a software pipeline for building loci out of a set of short-read sequenced samples. Stacks was developed for the purpose of building genetic maps from RAD-Tag Illumina sequence data, but can also be readily applied to population studies, and phylogeography."

I will follow the tutorial “building mini-contigs from paired-end sequences” to demonstrate how to use Stacks, and for each step executed I’ll post the directory and database sizes, as well as the processing time. My hardware is an Intel i5 quad core 2.30 GHz with 6GB of memory, running Debian 6.

Installing on your local computer

The installation is quite simple. However, Stacks come with a Web application, and during its pipeline analysis sometimes it uploads data to a database. So in our set up I’m going to use an Apache HTTP web server and MySQL database. Here’s a list of things you should have installed in your machine:

PHP MDB2 module installed (pear install MDB2 does the trick most of the times)

MySQL database (stacks come with a SQL script for MySQL, not sure how hard it would be to adapt this script and the perl utilities to use Oracle, DB2, etc) - my version: Ver 14.14 Distrib 5.5.24, for debian-linux-gnu (x86_64) using readline 6.2

Perl (5 is fine, didn't test with 6) - This is perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-linux-gnu-thread-multi

Install Stacks with the classic ./configure, make, make install. It will put several utilities and Perl scripts in your PATH. For testing if the installation worked, try running ustacks. You may need to update your Apache HTTP settings to include a virtual directory to the stacks web folder. In my computer this folder is located at /usr/local/share/stacks/php.

Installing on your cluster

Stacks uses OpenMP for running over multiple CPU’s. I have a small commodity-hardware cluster that is quite handy for MapReduce and MPI processing, but for this case I will have to use only one machine. So I’m using my notebook with four cores, and using Ganglia and htop/free for monitoring CPU and memory usage.

Following the Stacks tutorial

The tutorial requires that you download samples and create some directories and a database. Here’s the status of my system before the analysis.

CPU, memory and disk usage

I was using Ganglia for monitoring Hadoop jobs, so I saved the graphs for the timeframe of the pipeline execution. I started to execute this analysis at 10AM, and finished about 2 and a half hours later (the process is not automated, so sometimes I had to check the output and documentation before executing the next step).

Following you’ll find the raw graphs and another version with legends to help understand what was being executed.

CPU usage

Memory usage

Disk usage

I thought the disk graphs from Ganglia wouldn’t represent the disk usage very well, specially since my disk has about 500GB, so it’s harder to see the changes. So I used some R to plot a graph that I hope can demonstrate the usage for this tutorial. It is important to highlight that we are using samples from a tutorial, and your analysis may produce high or lower disk usage.

Hope it helps you using Stacks. Kudos to all developers and contributors of this great tool.