In the Trenches at GTC: Swift: A GPU-based Smith-Waterman Sequence Alignment Program

This week at GTC, Pankaj Gupta, a bioinformatics application developer at St. Jude Children’s Research Hospital, presented his work in the area of sequence alignment. Sequence alignment is an important component of bioinformatics that is crucial for the vision of personalized medicine. Sequence alignment matches new sequences to known sequences, while detecting point mutations, insertion mutations or deletion mutations for important initiatives such as finding better cures.

Current sequencing machines produce reads of gene samples through optical or electrical means. The read signals are converted into sequences of symbols called “bases” through machine-specific signal processing analysis, otherwise known as “primary analysis”. Machine agnostic sequence alignment algorithms are then applied to interpret the read sequences in the context of known sequences.

Pankaj Gupta’s new program Swift helps accelerate the sequence alignment process on the GPU. He chose to implement the Smith-Waterman algorithm targeting alignments of reads of ~100 base pairs and references of ~300 base pairs. Each read and reference is assigned to a thread block and threads within a thread block operate on the dynamic programming dependency wave front.

An initial version of Swift is available on with a GPL license sourceforge (see: http://sourceforge.net/projects/swiftseqaligner/). The current version accepts a read file in FASTA format and a reference file in FASTA format as inputs, and produces reference name, read name, gapped alignment, alignment score, alignment start and end positions, and alignment length as output.

I see Pankaj’s current work as a promising starting point to making an impact on the field of bioinformatics. I have a close relative who is living with cancer. Pankaj’s work provides me with hope that a whole generation of young researchers will master both bioinformatics and computer science to substantively advance this field in working towards a cure.

About Mark Harris

Mark is Chief Technologist for GPU Computing Software at NVIDIA. Mark has fifteen years of experience developing software for GPUs, ranging from graphics and games, to physically-based simulation, to parallel algorithms and high-performance computing. Mark has been using GPUs for general-purpose computing since before they even supported floating point arithmetic. While a Ph.D. student at UNC he recognized this nascent trend and coined a name for it: GPGPU (General-Purpose computing on Graphics Processing Units), and started GPGPU.org to provide a forum for those working in the field to share and discuss their work. Follow @harrism on Twitter