Motivation

We want to better understand the sensitivity and specificity of particular sequences/regions in the genome, ultimately leading to faster, higher quality genome alignments.

We want to establish time and cost benchmarks for processing large volumes of Human genome data.

Methods

We processed a 50bp step1 sliding-window data set of Human Genome build 19 using Ion Torrent’s TMAP algorithm (from author Dr. Nils Homer). The specificity/sensitivity analysis is underway right now, and will be published here in a followup blog post.

Results

Let’s look at the gross statistics of the operation:

How were the data generated?

Input was a 1bp step, 50mer tiling data set generated from HG19. We’re not hosting these data, you can generate them yourself with this make_reads.pl script.

Output was a set of SAM files from TMAP, which were then post-processed to a more compact form. See the Data section, below.