Developed by the Data Science and Data Engineering group at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

A genomic analysis toolkit focused on variant discovery.

The GATK is the industry standard for identifying SNPs and indels in germline DNA and RNAseq data. Its scope is now expanding to include somatic variant calling tools, and to tackle copy number (CNV) and structural variation (SV). In addition to the variant callers themselves, the GATK also includes many utilities to perform related tasks such as processing and quality control of high-throughput sequencing data.

These tools were primarily designed to process exomes and whole genomes generated with Illumina sequencing technology, but they can be adapted to handle a variety of other technologies and experimental designs. And although it was originally developed for human genetics, the GATK has since evolved to handle genome data from any organism, with any level of ploidy.

The GATK, designed for human genome and exome analysis and extended to handle other organisms.

The industry-standard GATK Best Practices

When you're isolating DNA in the lab, you don't treat the work like isolated, disconnected tasks. Every task is a step in a well-documented protocol, carefully developed to optimize yield, purity and to ensure reproducibility as well as consistency across all samples and experiments. We believe working with the sequencing data should be treated in the same thorough manner.

That's why GATK comes with complete reads-to-results Best Practices workflow recommendations, battle-tested in production at the Broad Institute and optimized to produce the most accurate results with the most computational efficiency.

Best Practices for SNP and Indel discovery in germline DNA - leveraging groundbreaking methods for combined power and scalability.

Platform and requirements

The GATK is designed to run on Linux and other POSIX-compatible platforms. Yes, that includes MacOS X! If you are on any of the above, see the Downloads section for downloading and installation instructions. Windows systems are not supported. And no, there are no plans to port the GATK to Android or iOS in the near future ;-)

You will need to have Java 1.8 installed to run the GATK, and some tools additionally require R to generate PDF plots. Detailed version requirements and installation instructions for both can be found in the Documentation Guide.

Versions of GATK up to 3 were optimized to run in traditional research computing environments such as local clusters and servers. The next generation of GATK tools (GATK4, available today as an alpha preview) are being developed to run best in cloud environments and to leverage Spark architectures wherever possible.

The GATK is designed to run on Linux and other POSIX-compatible platforms, including MacOS X.

So what's in the can?

At the heart of the GATK is an industrial-strength infrastructure and engine that handle data access, conversion and traversal, as well as high-performance computing features. On top of that lives a rich ecosystem of specialized tools, called walkers, that you can use out of the box, individually or chained into scripted workflows, to perform anything from simple data diagnostics to complex reads-to-results analyses. See the Tool Docs for a complete list of tools and their capabilities.

Many GATK tools can be parallelized by multithreading for faster execution. See this article for more details on parallelism with the GATK.

The toolkit provides a wide set of tools that can be chained into workflows, taking advantage of the common architecture and powerful engine.

Command structure and tool arguments

GATK does not have a graphical user interface. All the GATK tools are run from the command-line using the same basic command structure. The -jar argument invokes the GATK engine itself, and the -T argument tells it which tool you want to run. Arguments like -R for the genome reference and -I for the input file are also given to the GATK engine and can be used with all the tools (see complete list of available arguments for the GATK engine. Most tools also take additional arguments that are specific to their function. These are listed for each tool on that tool's documentation page, all easily accessible through the Tool Documentation index.

Free for academics, fee for commercial use

GATK is released under a mixed licensing model: researchers at academic and non-profit organizations using GATK for non-commercial purposes can access the tools and source code for free while for-profit organizations are required to purchase a license.

The revenue generated by commercial licensing is used to fund and build out our support team and infrastructure to accommodate the demand for support in the community, as well as invest more resources to improve development speed, functionality and stability overall.

Academic non-commercial licensing

The text of the academic license can be viewed here. If your usage qualifies for this license, you can download the program and start using it right away.

Commercial licensing

We provide licensing directly to commercial/for-profit organizations that will be running the GATK or MuTect internally or as part of their own hardware offering. To inquire about licensing GATK for commercial use and/or redistribution of GATK as a service, please contact softwarelicensing@broadinstitute.org.

Don't panic! Help is at hand

The GATK has a reputation for being wicked complicated, and it's not entirely undeserved. With great power comes great responsibility complexity... But we're here to help.

Need more? We have more

Be sure to check out the Presentations from our recurring workshop series. In addition to the slide decks, we provide recordings of the workshops that we hold at the Broad; you can view them on the Broad website or on the Broad education channels on YouTube and iTunesU.

Finally, if you've exhausted all these avenues and still haven't found the answer to your question, check out the forum! You may find that others have run into the same problem and that the solution has already been posted. If not, let us know and we'll do our best to address your problems quickly and accurately. If something's not clearly documented, we'll answer your question and improve the docs accordingly. If you think you found a bug, we'll track it down and fix it. Just ask the team.