README.textile

bam-readcount

The purpose of this program is to generate metrics at single nucleotide positions.
There are number of metrics generated which can be useful for filtering out false positive calls.
Help is currently available on the commandline if you do not supply the program
any arguments.

This program reports readcounts for each base at each position requested.
It also reports the average base quality of these bases and mapping qualities of
the reads containing each base.

The list of regions should be formatted as chromosome start and end. Each field should be tab separated and coordinates should be 1-based.
The optional region specification on the command line should follow the same format as that used for samtools (chr:start-stop)

avg_pos_as_fraction → average position on the read as a fraction (calculated with respect to the length after clipping). This value is normalized to the center of the read (bases occurring strictly at the center of the read have a value of 1, those occurring strictly at the ends should approach a value of 0)

avg_num_mismatches_as_fraction → average number of mismatches on these reads per base

avg_sum_mismatch_qualities → average sum of the base qualities of mismatches in the reads

num_q2_containing_reads → number of reads with q2 runs at the 3’ end

avg_distance_to_q2_start_in_q2_reads → average distance of position (as fraction of unclipped read length) to the start of the q2 run

avg_clipped_length → average clipped read length of reads

avg_distance_to_effective_3p_end → average distance to the 3’ prime end of the read (as fraction of unclipped read length)

Per Library Output

Beginning with version 0.5.0, bam-readcount can now output counts per each library within a single BAM file. Libraries are reported with the same metrics as in
normal counting, but each library is listed by name and its metrics denoted by curly braces as follows:

In the output, after the chromosome, position, reference base, and depth fields the of per library output is formated as lib_name[TAB]{bam readcount metrics}[TAB]lib_name[TAB]{bam readcount metrics}. The libraries are repeated in this format from library 1 to library N.

Note that for some RPM based systems (like RHEL) and older versions of Ubuntu, you will need to install cmake 2.8.3 or greater yourself as the packaged version is older.

Build Instructions

Clone the bam-readcount repository

Recursively clone the git repository

git clone https://github.com/genome/bam-readcount.git

Compile bam-readcount

bam-readcount does not support in source builds. Create a new build directory, enter it, and run:

cmake /path/to/bam-readcount/repo
cd /path/to/bam-readcount/repo
make

The binary can then be found in the bin/ subdirectory of your build directory.

FAQ

I get errors from cmake about missing modules. How do I fix this?

These errors should no longer occur as of commit e48b05cbb9a9222d7585958f9704b147f0a4cbea. For earlier commits, bam-readcount contains a git submodule called build-common. It contains helper modules for cmake. If you downloaded the source as a tarball from github or forgot to do a recursive clone using git, then you will not have this submodule and will see cmake errors. If you are using git, we recommend you go back and use the —recursive option when cloning the repository. If you cannot use git, follow the instructions below to remedy the situation.

Extract that tarball, and rename the directory it creates to ‘build-common’.

Replace the empty build-common subdirectory in the sniper directory with directory you just created.

Resume following the build instructions.

I get a segfault when attempting to run on a whole bam file.

This is a known bug that will be addressed in future versions. In most cases, you will only need read counts on a subset of positions and specifying the sites of interest is functional. Should you need to run the entire genome through then running by chromosome and concatenating the results should be a decent workaround (e.g. like bam-readcount -f ref.fasta some.bam chr1).

User Support

Please first search Biostar and then ask a question there if needed. We automatically monitor Biostar for questions related to our tools.