Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community

To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

LATEST RELEASE: FireCloud's latest release was on February 13th. Release Notes can be found here.

Broad Mini Mutation Calling Workspace

Overview

This "Mini" Mutation Calling Tutorial includes a subset of tools in our complete Broad Mutation Calling Workflow. It contains ContEst, MuTect, and Oncotator tools. When run on "mini" tumor and cell line BAMs (containing only 100 genes), the expected runtime is roughly 30 minutes.

ContEst

ContEst estimates contamination levels in next-generation sequencing data. It uses a Bayesian approach to calculate the posterior probability of the contamination level and determine the maximum a posteriori probability (MAP) estimate of the contamination level.

Oncotator is a tool for annotating information onto genomic point mutations (SNPs/SNVs) and indels. It is primarily used for human genome variant callsets. However, the tool can also be used to annotate any kind of information onto variant callsets from any organism.

Method Flow

Below is an overview of the individual tools within the Broad Mutation Calling Workflow.

What does ContEst do?

ContEst uses a Bayesian approach to calculate the posterior probability of the contamination level and determine the maximum a posteriori probability (MAP) estimate of the contamination level.

ContEst supports array-free mode, where we genotype on the fly from matched normals, and use that as our source of homozygous variant calls. It currently calls anything with > 80% of bases as the alternate with at least 50X coverage a homozygous alternate site.

What does MuTect do?

Pre-process the aligned reads in the tumor and normal sequencing data.

In this step MuTect ignores reads with too many mismatches or very low quality scores since these represent noisy reads that introduce more noise than signal.

Identify using statistical analysis sites that are likely to carry somatic mutations with high confidence.

The statistical analysis predicts a somatic mutation by using two Bayesian classifiers – the first aims to detect whether the tumor is non-reference at a given site and, for those sites that are found as non-reference, the second classifier makes sure the normal does not carry the variant allele. In practice the classification is performed by calculating a LOD score (log odds) and comparing it to a cutoff determined by the log ratio of prior probabilities of the considered events. For more information, refer to the MuTect Cancer Genome Analysis page.

Post-processing of candidate somatic mutations

This step aims to eliminate artifacts of next-generation sequencing, short read alignment and hybrid capture. For example, sequence context can cause hallucinated alternate alleles but often only in a single direction. Therefore, MuTect tests whether the alternate alleles supporting the mutations are observed in both directions.

By default, Oncotator uses a simple TSV file (e.g., MAFLITE) as an input and produces a TCGA MAF as an output. Oncotator also supports VCF files as an input and output format.

By extension, Oncotator can be configured to annotate genomic data with HTML reports. In this BasicSomaticMutationCalling workflow, Oncotator populates an HTML report to the Workspace Data tab.

Inputs and Outputs

Below are the tool-specific inputs and outputs for this workflow.

ContEst Inputs

normalBamHG19

normalBamIndexHG19

tumorBamHG19

tumorBamIndexHG19

ReferenceFasta

ContESTIntervals

HapMapVCF

SNP6Bed

ContEst Outputs

ContEstTask.contaminationFile (Inputs into MuTect)

MuTect Inputs

normalBamHG19

normalBamIndexHG19

tumorBamHG19

normalBamIndexHG19

tumorBamHG19

tumorBamIndexHG19

ReferenceFasta

HapMapVCF

ReferenceFastaIndex

ReferenceFastaDict

COSMICVCF

DBSNPVCF

MutectIntervals

ContEstTask.contaminationFile (Output from ContEst)

MuTect Outputs

MutectTask.MAFLiteFile (Input to Oncotator)

MutectTask.CallStatsFile

Oncotator Inputs

MutectTask.MAFLiteFile (Output from MuTect)

OncoVCF

Oncotator Outputs

SAMPLE.vcf

oncotator.log

oncotator_out.html

How to run this workflow in FireCloud

1. Clone the broad-firecloud-tutorials/MiniMutationCalling_V1_Tutorial workspace to run this workflow.

2. In your cloned workspace, navigate to the Method Configurations tab and click on the method, MiniMutationCalling.

3. Click Launch Analysis.

4. In the Launch Analysis window, toggle to pair and select a pair on which to run this workflow, e.g., HCC1143_pair_100_gene_250bp_pad. You can also run this workflow on a pair set by toggling to pair_set. Note: You must then type this.pairs in the Define Expression field.

5. Finally, click the Launch button. Check back on the Monitor tab after 30 minutes or so to view results from your workflow analysis.

6. When the status displays Done, click on the most recent analysis run to view outputs and results, e.g., HCC1143_pair_100_gene_250bp_pad (pair).

7. Click on Outputs: Show, then select output files to view the results of this analysis.

8. You can also view the Oncotator HTML report as an attribute in the Data tab.