Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery
and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Learn more

Category
Variant Filtering

Overview

Create a panel of normals (PoN) containing germline and artifactual sites for use with Mutect2.

The tool takes multiple normal sample callsets produced by Mutect2's tumor-only mode and collates them into a single
variant call format (VCF) file of false positive calls. The PoN captures common artifactual and germline variant sites.
Mutect2 then uses the PoN to filter variants at the site-level.

This contrasts with the GATK3 workflow, which uses CombineVariants to retain variant sites called in at least
two samples and then uses Picard MakeSitesOnlyVcf to simplify the callset for use as a PoN.

The tool also accepts multiple .args files. Pass each in with the -vcfs option.

By default the tool fails if multiple vcfs have the same sample name, but the --duplicate-sample-strategy argument can be changed to
ALLOW_ALL to allow duplicates or CHOOSE_FIRST to use only the first vcf with a given sample name.

See Mutect2 documentation for usage examples.

CreateSomaticPanelOfNormals specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

How to handle duplicate samples: THROW_ERROR to fail, CHOOSE_FIRST to use the first vcf with each sample name, ALLOW_ALL to use all samples regardless of duplicate sample names.
How to handle duplicate samples: THROW_ERROR to fail, CHOOSE_FIRST to use the first vcf with each sample name, ALLOW_ALL to use all samples regardless of duplicate sample names."

The --duplicate-sample-strategy argument is an enumerated type (DuplicateSampleStrategy), which can have one of the following values:

VCFs for samples to include. May be specified either one at a time, or as one or more .args file containing multiple VCFs, one per line.
The VCFs can be input as either one or more .args file(s) containing one VCF per line, or VCFs can be
specified explicitly on the command line.