pBWA

Parallel Burrows-Wheeler Aligner

Update - October 9th, 2012

An alternate version of pBWA is provided. This version allows both ends of a paired set to be aligned within the same aln command, as well as removes the requirement of specifying the number of reads in the input FASTQ files.

Introduction

pBWA is a parallel implementation of the popular software BWA. It was developed by modifying the BWA source code with the OpenMPI C library on the SHARCNET. pBWA has been successfully tested on other systems with the most basic OpenMPI installs. pBWA currently implements three commands from BWA: aln, samse, and sampe. pBWA retains and improves upon the multithreading provided by BWA while adding efficient parallelization for the above listed functions. pBWA has shown that its wall-time speedup is bounded only by the size of the parallel system available as pBWA can run on any number of nodes and/or cores simultaneously.

Requirements

pBWA requires a multi-node (or multi-core) *nix system with a parallel scheduler alongside the OpenMPI C library in order to compile and run. Each processor executing pBWA requires as much RAM as BWA requires given the same dataset and parameters. pBWA requires an index generated by BWA for each genome it wishes to align to.

How to Get

The project page, alongside available downloads and support forums can be found here.

How to Use

pBWA can be executed as long as there is a pre-existing index created by BWA's index command. Below are generic commands, and the other tabs will show examples using a parallel scheduler. The scheduling commands will match those used on the SHARCNET.

Lack of multithreading is best for systems with sufficient RAM (>=4GB/core). This ensures that all stages (sampe, aln) will receive the maximum amount of parallelism. Note that each stage of pBWA MUST be executed with the same number of parallel processors, although the number of threads can differ.

The -q flag tells us it is an MPI program. The -n flag tells us we want 240 parallel processors executing pBWA. The -r flag is a system requested time limit and the --mpp flag tells the system we need each parallel process to be given 4GB of RAM.

Using a combination of multithreading and parallelism is best for systems lacking sufficient RAM. The number of processors and threads used depends on the system. Play around and see what works best for yours! Below is an example for the Orca cluster (24 cores/node and 1.33GB RAM/core) on the SHARCNET. Note that some systems will not support a combination of parallelism and multithreading and attempts to do so can lead to unexpected results such as segmentation faults.

The -N flag tells us we want our 40 processors to be spread out evenly over 10 nodes. Requesting 8G of memory (more than we need) ensures that the 4 processors per node are taking up all of the RAM on that node, ensuring the remaining 20 cores will be available for threading. This allows each process to spawn 6 threads for a total of 240 threads of execution. Note that for sampe, we are only able to run it with 40 threads (processes) of execution due to the lack of multithreaded availability. Future releases of pBWA are expected to introduce multithreading for samse/sampe.