Category: Announcement

Primary Menu

Breadcrumbs

At DNAnexus we focus on enabling scientific research and discovery by removing barriers such as data compliance and security concerns, allowing our global community of customers to focus on innovation, not logistics. DNAnexus is the only cloud-based translational informatics platform that offers regional services in China, North America, Europe, and Asia Pacific, and now, we are pleased to offer our customers a second region in Europe, Microsoft Azure’s West Europe.

Data generated in the EU is subject to strict European privacy regulations. DNAnexus has adopted policies and implemented new procedures that comply with European regulations. Consequently, DNAnexus users based in the European countries can upload genomic data from individuals in those countries to DNAnexus in compliance with European data privacy laws and regulations (e.g. GDPR) and stay in compliance with regulations regarding data localization.

The additional region in Europe will address the needs of the growing European BioPharma and Precision Medicine customers, allowing them to seamlessly access the Azure cloud platform via DNAnexus while complying with European regulations, including the GDPR and other data privacy regulations. Organizations will be able to collaborate with research centers in the EU and across the globe to analyze biomedical data on one common platform, while still maintaining compliance with European regulations. The DNAnexus Platform allows data owners the ability to set up compliance policies and enforce local/regional restrictions around their data, eliminating the confusion and potential risks of collaborating globally.

Other security features enable researchers to log, monitor, and block any unauthorized data access and prevent users from downloading or copying data outside of their regional boundaries. These features, in addition to industry-leading data management, workflow and data provenance tracking features of the DNAnexus Platform, provide an environment to work with large-scale biomedical and genomics data that complies with EU GDPR and other data privacy regulations, enabling customers to focus on scientific discovery – not compliance.

You can find out more about DNAnexus expansion and offerings at Bio-IT World. Visit DNAnexus in booth #310 and learn more about our conference activities on our blog. You can also stop by our demo session at Microsoft booth #446 on Wednesday, May 16th at 3:30pm.

In this blog post, we discuss Edico Genome’s DRAGEN Bio-IT Platform for rapid secondary analysis. We benchmark DRAGEN for speed and accuracy on diverse WGS datasets. Finally, we detail how Edico Genome and DNAnexus collaborated to improve the DRAGEN pipeline performance on noisy datasets and PCR-samples in the newest version.

DNAnexus recently released a method to generate real, noisy NGS data called Readshift (blog) – (code). Edico Genome has produced a new version of their WGS tools, labeled DRAGEN V2+, which we evaluate on Readshift.

First, we present evaluations on an HG002 benchmark dataset which was never made available to Edico Genome to ensure that the improvements apply generally. This set of benchmarks use 35X PCR-free WGS data with the hs37d5 reference. Evaluation is performed using the samemethods as used on precisionFDA. We compare DRAGEN V2 (the prior version) to DRAGEN V2+ to demonstrate Edico Genome’s rapid improvement.

DRAGEN-Scale: How Fast is DRAGEN?

Figure 1 compares execution speed of DRAGEN relative to popular pipelines executed on DNAnexus. For many of these apps (e.g. GATK3), DNAnexus has applied additional optimizations to improve parallelism, meaning they are faster than they would be on local infrastructure. When time is critical, DRAGEN V2 and V2+ are the clear leader.

DRAGEN accelerates both the mapping process and variant calling, which can be run independently. This allows users to mix-and-match if a specific variant caller is required. Upstream of the secondary analysis, Edico Genome makes an accelerated BCL2FASTQ tool which greatly improves speed and efficiency while producing identical FASTQs.

How Accurate is DRAGEN?

Figures 2 and 3 demonstrate that DRAGEN’s speed does not come at the expense of accuracy. The newest version of DRAGEN achieves a ~40% reduction in SNP Error rate and ~50% reduction in Indel Error rate. Edico Genome was also recently designated as one of the winners of the precisionFDA Hidden Treasures Challenge.

How to Train Your DRAGEN – New Improvements in DRAGEN V2+

The recent DNAnexus Readshift blog post compared pipeline performance in a number of noisy conditions. After this blog, DNAnexus and Edico Genome discussed improvements based on the findings of Readshift. Edico Genome rapidly iterated several new development versions which we built as DNAnexus Platform apps, evaluated, and discussed with Edico Genome.

In both the Readshift blog and an earlier DeepVariant blog, we noticed that certain HiSeqX samples caused much worse indel calling performance. We subsequently realized that these samples were the only non PCR-free ones. The use of PCR in samples is required in several cases, e.g. small DNA inputs, as well as new fast and automated-prep Illumina Nextera Flex kits. Strelka2’s strong results suggest that Illumina took special care to consider performance in PCR samples, and that other callers could learn similarly.DRAGEN’s performance now matches Strelka2’s on these samples as leading the pack in performance for indels on PCR samples (the error mode that dominates in these samples).Improved Robustness Against Low-Quality Reads

Readshift’s main focus was to understand how calling performance degrades as the quality of a sequence run decreases. DRAGEN V2+ is not only more accurate, it is also able to resist the effect of lower quality reads up to the most extreme shift of +2.0 std.

* Discussion with Chris Saunders of Illumina indicates that a specific heuristic for SNP calling in Strelka2 may be interacting with Readshift. We are working to make a version of Readshift which will not trigger this heuristic for a more accurate Strelka2 evaluation.

Faster Runtime on NovaSeq Samples

Readshift identified that low-quality NovaSeq data can lead to dramatically longer runtime (or program crashes). Brad Chapman has completed an excellent investigation into the use of read trimming in somatic calling that may help this and more broadly.

Edico Genome has made improvements to the runtime of DRAGEN on NovaSeq samples across the board, demonstrating the ability to quickly improve for new data types.

However, the relative slowdown in low-quality NovaSeq samples remains. DNAnexus and Edico are continuing discussions on how to improve this issue.

Future Directions

In addition, Edico Genome recently released its new DRAGEN Virtual Long Read Detection (VLRD) Pipeline (Coming Soon to DNAnexus), designed at achieving greater accuracy in segmental duplications than standard variant callers. In this pipeline, Edico Genome leverages the fact that because DRAGEN computes so quickly, they can leverage computationally intensive assembly-based techniques and jointly calling all regions that are similar. We hope to take a deeper investigation into this method (and these difficult, but important, regions of the genome) in a future blog. Based on the responsiveness of Edico in these collaborations, we are quite convinced there will be a:

How to Train your DRAGEN 2

Edico’s Genome’s DRAGEN is available now as an easy to use app on DNAnexus. To pilot using DRAGEN V2+ on DNAnexus in your workflow, email edico@dnanexus.com.

We’re partnering in an exciting new collaboration with St. Jude Children’s Research Hospital and Microsoft to analyze and store half a petabyte of pediatric cancer genomic data. This collaboration will accelerate discoveries and treatments to cure pediatric cancer and other rare diseases by giving researchers and clinicians the ability to collaborate globally and enabling the rapid generation and analysis of genomic data.

DNAnexus, deployed on Microsoft Azure, provides a secure and agile ecosystem in the cloud while simultaneously eliminating security, storage and speed limitations – all of which will enable St. Jude researchers to focus on complex problems on a collaborative, global scale.

DNAnexus’ strength comes from its agile co-development process. We partner with our customers to solve new big data problems that are continuously evolving. Our team works closely with the St. Jude and Microsoft teams to determine the specific requirements and translated it into tailored solutions. From kick-off meeting to production deployment, its a seamless process that helps our customers and collaborators achieve their goals, no matter how ambitious.

With our secure, cloud-based infrastructure and complimentary tools, researchers will be able to integrate a multitude of disparate datasets, develop their own tools, and collaborate in a secure environment enhancing the sharing of data and accelerate discoveries.

You can read more on how we’ve joined forces to fuel scientific discovery in a joint press release from St. Jude here and Microsoft has written a great blog post where you can learn more about Microsoft Genomics Service and the partnership.

We’ll be at this year’s HIMSS 2018 Conference and available at Microsoft’s booth #3832 in Las Vegas, Nevada from March 5th – March 9th, as part of the larger Microsoft patient journey providing solutions in enabling more precise treatment and better patient outcomes.

Visit us at Microsoft booth #3832 and schedule a meeting with our team – email us at himss18@dnanexus.com.

Categories

Subsidiary Sidebar

About DNAnexus

DNAnexus provides a global network for sharing and management of genomic data and tools to accelerate genomic medicine. The DNAnexus cloud-based platform is optimized to address the challenges of security, scalability, and collaboration, for organizations that are pursuing genomic-based approaches to health, in the clinic and in the research lab.

The DNAnexus team is made up of experts in computational biology and cloud computing who work with organizations to tackle some of the most exciting opportunities in human health, making it easier—and in many cases feasible—to work with genomic data. With DNAnexus, organizations can stay a step ahead in leveraging genomics to achieve their goals. The future of human health is in genomics. DNAnexus brings it all together.