This presentation will describe several case studies to highlight the bioinformatics challenges we face when analyzing NGS data, the computational infrastructure required to enable such analyses, and the analysis algorithms and strategies used to solve the problems at hand. From our early successes (and failures) we have already learned crucial lessons that will help to maximize the impact of future NGS projects, and to prepare for third generation sequencing technologies.

The Stanford Genome Technology Center is a world leading genomics facility bridging the gap between genomics and medical care. Our data flow and analysis pipelines are integrated to deliver high-throughput with simultaneous analysis. We see a five-fold increase in throughput and significant reduction in cost for IT infrastructure linking queuing theory and sorting algorithms. This case study discusses the next gen sequencing pipeline and illustrates the algorithms and software used for significant performance gain cost savings.

Murali Ramanathan, Director of Graduate Studies, Pharmaceutical Sciences and Neurology, State University of New YorkThe risk of developing of many complex diseases is related to the interactions of environmental factors with genes. Effective and efficient methods for identifying and modeling gene-environment interactions (GEI) are critical for medical discovery from next generation sequencing studies. However, GEI analysis is a combinatorially explosive problem. I will describe AMBIENCE and related GEI analysis algorithms that use novel information theoretic search metrics to search combinatorial space. I will also demonstrate how novel data intensive supercomputing architectures are capable of enhancing computational efficiency in these applications.

Sponsored by12:30 Luncheon Presentation
Selecting A LIMS for Next Generation Sequencing Research

No other industry has seen processing speeds rise and costs drop as dramatically as genomics. Modern genomics labs are now struggling to manage the data these techniques generate. A recent survey cites data storage, data management, and informatics as the biggest hurdle to expanding next gen sequencing (NGS). Moreover, analysis costs for sequencing remain high, spotlighting the need for better ways to centralize information and track sample information across experiments. This talk reviews the informatics challenges presented by NGS and proposes three criteria that labs should assess when selecting an NGS lab information management system (LIMS).

The massive scale of next-generation sequence data forces analysts to often make compromises between sensitivity and specificity, accuracy and speed, etc. How can an analyst be certain that they are making the right choices? This presentation will discuss a combined computational and laboratory framework that allows for unprecedented exploration of the computational variable (tools and their parameters) space, ensuring optimal analysis pipelines are employed for each data set.

Sponsored by2:45 How Many Indels Are You Missing? Highly Accurate Variant Analysis in Diagnostic Applications with Omixon Variant Toolkit

Sponsored by3:00 Accurate Complete Sequences of Over 1000 Human Genomes including a Ethnically Diverse Reference Panel Released to the PublicSteve Lincoln, Vice President of Scientific Applications, Complete GenomicsWe have developed a custom sequencing platform which can inexpensively produce high-depth sequences of complete human genomes in large scale. Sophisticated and specialized bioinformatics algorithms leveraging local de novo assembly allow this system to achieve high sensitivity and specificity for discovering SNPs, indels, and structural variants. We have applied these methods to family studies of simple and complex disease, cell biology studies and the study of somatic mutation in cancer. We will review progress on the platform to date and focus on a set of over 60 ethnically diverse genomes which have been generated for release to the public.
3:15 Refreshment Break in the Exhibit Hall and Poster Viewing

We have implemented a BioIT World award winning knowledge management platform - tranSMART - supporting translational research. The initial focus was to combine clinical, genomics and proteomics data from clinical and non-clinical studies. We now are extending the system to support biomarker discovery using genetics data - in particular SNP chips and next-generation sequencing. In this talk we will present how this open source system is being extended and initial success will be highlighted.

Victor Jin, Ph.D., Assistant Professor, Department of Biomedical Informatics, The Ohio State University

This talk presents a novel algorithm based on a bi-asymmetric-Laplace model (BALM) to analyze both ChIP-seq and MBD-seq data. The algorithm was not only tested to achieve better accuracy on publicly available TF ChIP-seq data compared to other tools, but also applied to analyze MBD-seq data from breast cancer MCF7 cells. The results demonstrate the algorithm’s ability to distinguish closely positioned target sites and to accurately predict DNA methylation regions. This study demonstrates BALM may provide another useful tool for the sequencing user community.

Sponsored by4:45 The Pipeline Pilot NGS Collection: A New Approach to the Challenges of NGS Data AnalysisClifford Baron, Product Marketing Director, Accelrys
In repeated surveys, scientists using next generation sequencing technologies report that data analysis is their greatest challenge, and the most significant impediment to continued market growth. This is so despite the availability of over a dozen commercial software offerings and literally hundreds of public domain NGS algorithms, with more appearing weekly. The most frequently discussed factor contributing to the data analysis challenge is the sheer volume of data generated. But as significant though less frequently acknowledged is the rapid evolution of available algorithms and attendant computational best practices, and the need for techniques tailored to specific research goals. We discuss how Pipeline Pilot, a widely used commercial software system for the rapid development and deployment of computational pipelines, can be used along with a newly released collection of NGS analysis components to address these fundamental challenges.

There is an increasing disconnect between the ability to generate sequence data by using second and third generation methods and the ability to interpret what the sequence data means. In tumor DNA sequencing, for example, there are many common mutations being found in cancers but there are also mutations that are being found in the same cancers by some sequencing techniques but not by others. This presentation will explore why this is and what it means.

Sponsored by12:00 pm Dell Next Generation Bioinformatics and Research Computing Solutions: The Power to do more ScienceJose Alvarez, Business Development Manager, HPC Solutions, DellUtilizing High Performance purpose build building blocks, Dell is simplifying research computing. Dell has created an ecosystem that is helping research groups accelerate their time to results and enhance the user interaction by simplifying reference architecture, deployment and integration. Dell has also partnered with Next Generation Sequencing (NGS) industry leaders and instrument vendors to deliver an array of solutions that facilitate the collection and analysis of NGS data. With an array of high performance storage and archival solutions, Dell has simplified the retention and management of the NGS data life cycle. In this short presentation the Dell Life Sciences Research Computing team will give a snapshot of the ecosystem that gives researchers the power to do more Science.

Sponsored by12:15 From NexGen Sequencing Data Management to 4'th Generation SequencingMichael Hehenberger, Ph.D., IBM, T.J. Watson Research CenterIBM is currently working with leading Sequencing Centers on data management challenges posed by whole genome sequencing activities. It is shown how leading edge hardware and software solutions can be used to address the related extreme requirements. In addition, IBM Research has partnered with Roche 454 to develop a new "DNA Transistor" based sequencing technology. While the technical challenges are significant, the partners are optimistic about being able to succeed with this exciting project.

12:30 Luncheon in the Exhibit Hall and Poster Viewing

2:00 Exhibit Hall Closes

1:55 Chairperson’s Remarks

Kevin Davies, Ph.D., Editor-in-Chief, Bio-IT World

2:00 Using Next-Gen Analysis to Improve Cancer Treatment Decisions

Paul Aldridge, CIO, Genomic Health

This presentation will cover various use cases for next generation sequencing data and analysis for research into cancer treatment efficacy. Attendees will gain a broader knowledge of costs and other considerations when using various approaches to enable R&D researchers to get more discoveries done.

Sequencing Informatics Trends
and New Applications

2:30 NGS-AaaS: Next Generation Sequencing-Annotation as a ServiceRobert Haines, University of Manchester, UK

Next Generation Sequencing technologies bring genome-wide sequencing within the reach of a greater number of research labs. The $1000 genome, however, is accompanied by the $100,000 analysis. How do we keep down the cost of analytics? How do we enable labs with limited bioinformatics capability or local compute provision to benefit from NGS? Scientific workflow systems can be used for assembly and annotation pipelines. Focusing on the latter, Manchester, together with partners in Liverpool and Eagle Genomics Ltd, are using the commercial Amazon EC2 cloud and the open source Taverna workflow system to operate an on-demand, low cost, on-line analytics service for DNA analysis. As a case study we will present an AaaS application for understanding genetic variation between cattle breeds.

3:00 Sequencing without a Sequencer: How Buying Lanes Can Beat Buying a Machine

What are the economics of buying sequencing services vs. owning your own lab? How can you mix internal operations with contracted ones? What are potential issues in vendor performance? What are the trade-offs of accessing multiple sequencing platforms through vendors? This talk will focus on the economic & operational issues around contracting for sequencing & analysis services including vendor selection issues, vendor experiences, and opportunities.

The Expression Atlas is a cloud computing based distributed infrastructure for organizing and querying multiomics data. Built upon the open-source Expression Atlas project at the EBI in partnership with the pharmaceutical industry, the Atlas provides a scalable solution that can be easily deployed on in-house servers or accessed remotely in the cloud. Learn how the Atlas deals with secure processing and combined analysis and integration of public/private transcriptomic and proteomic data, with an emphasis on our novel pipeline for next-generation sequencing data processing and reporting.