You are here

Workshop: Variant Discovery in Next-Generation Sequencing (NGS) Data

Primary tabs

The emergence of next-generation sequencing (NGS) technology in biological sciences has revolutionized and fundamentally altered the way we conduct eco-evolutionary research. This inevitably has led to an increasing demand of highly-skilled researchers who are able to effectively analyze and manage such large-scale data sets. Within this frame-work a 2-3 day workshop will be hosted at the Royal Belgian Institute of Natural Sciences in Brussels, Belgium where invited instructors from the Broad Institute (Cambridge, MA) will provide an outline of how to process NGS data using the freely available Broad’s Genome Analysis Toolkit (GATK). GATK is a versatile structured programming framework offering a variety of tools for processing NGS data with a key focus on data quality control and correctly calling SNPs and indels. GATK can handle basic actions such as data access and conversion but also includes a set of specialized tools, called "walkers" that you can use out of the box, individually or chained into scripted workflows, to perform anything from simple data diagnostics to complex "reads-to-results" analyses. During this workshop instructors will focus on the core steps involved in callings variants with GATK “Best Practices” workflow. You will learn why each step is essential to the calling process, what are the key operations performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of your dataset.

The workshop consisted of lecture-style sessions (first two days) during which the GATK development team explained the rationale, theory and real-life applications of the ‘Best Practices’. During the optional hands-on sessions (third day) the GATK team helped each participant work through interactive exercises and tutorials in which they applied the ‘Best Practices’ to real datasets.

This workshop was organized within the framework of the IUAP-project SPEEDY (BELSPO), BRAIN-project ‘GENESORT’ (BELSPO) and the Belgian Network for DNA Barcoding, BeBoL (FWO, JEMU), from which it received financial support.

12:30 The benefits of analyzing cohorts of samples rather than single samples

13:00 Lunch break

14:00 Quality control of inputs and outputs

14:20 Benchmarking results with standard resources

14:40 Resources, documentation & support

15:00 Coffee break / question time

15:30 Parallelism options in the GATK

15:50 Building pipelines with Queue

16:30 End

Thursday, 26 June 2014 Hands-on exercises (max. 25 participants)
We go through the Best Practices step-by-step using real data sets. Mostly aimed at beginners but basic familiarity with command line tools is expected. People will have to bring their own laptop, Linux or MacOsX required (you can install a virtual machine like VMware Player on a Windows machine).