Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery
and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Learn more

Calculate the fraction of reads coming from cross-sample contamination

Category
Diagnostics and Quality Control

Overview

Given pileup data from GetPileupSummaries, calculates the fraction of reads coming from cross-sample contamination.

The resulting contamination table is used with FilterMutectCalls.

This tool and GetPileupSummaries together replace GATK3's ContEst. Like ContEst, this tool estimates contamination based on the signal
from ref reads at hom alt sites. However, ContEst uses a probabilistic model that assumes a diploid genotype with no copy number
variation and independent contaminating reads. That is, ContEst assumes that each contaminating read is drawn randomly and
independently from a different human. This tool uses a simpler estimate of contamination that relaxes these assumptions. In particular,
it works in the presence of copy number variations and with an arbitrary number of contaminating samples. In addition, this tool
is designed to work well with no matched normal data. However, one can run GetPileupSummaries on a matched normal bam file
and input the result to this tool.

The resulting table provides the fraction contamination, one line per sample, e.g. SampleID--TAB--Contamination.
The file has no header.

Example: matched normal mode

CalculateContamination specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.