A computational biologist's personal views on new technologies & publications on genomics & proteomics and their impact on drug discovery

Friday, September 05, 2014

Oxford Takes Some Flak, Fires Back

A huge event in the genomics community this summer has been the Oxford Nanopore MinION Access Program (MAP), which has enabled a sizable but select group of researchers to try out ONT's novel nanopore-based sequencing technology. While results and rumors have periodically drifted out over the summer, this week saw three disclosures, one of which resulted in fireworks and actionFrom a sheer technological standpoint, ONT's MinION device is a wonder. Prior single-molecule sequencers have been table-sized; ONT's has the form factor of a USB stick. None of that matters if it can't sequence, so the question is can it do that. With the MAP, researchers around the world have been trying to do so, and the general response has been successful sequencing (our group had a read align to all of lambda out of our first run -- that's right, >48Kb of alignable sequence in a single read!) but with a lot of glitches.

ONT took on a large task trying to deliver a complex system to many labs of varying experience around the world, and Murphy's Law has struck quite frequently -- but there is also great evidence ONT is learning from their trials. ONT has never advertised the MAP as delivering a finished product but rather an opportunity to try a not-yet-finished system, doesn't seem surprising. There's also been an active effort by the MAP community to share tips and tools: two software package announcements have already cleared peer review. Perhaps if that had occurred to me, I would have submitted the first tool for read extraction that showed up on the community, which a few other MAPpers found useful.

The first events this week were talks at a UK conference by Oxford CTO Clive Brown and MAP participant (and admitted MinION fanboy) Nick Loman. Brown's talk discussed the technology in some detail and outlined where it may be going, and also described some of the experiences of the MAPpers and how ONT is trying to reduce the platform's performance variation as well as enable simpler library prep and working directly out of biological samples. Loman described his groups' experience with the MinION -- warts and all -- and mentioned it is already being used for real molecular epidemiology studies. A good week for MinION.

So it was a large bucket of cold water -- figuratively, not on YouTube -- when a pre-print showed up titled "A first look at the Oxford Nanopore MinION sequencer" by Alexander Mikeheyev and Mandy Tin along with a GitHub repository of data and scripts (as well as a free copy of the pre-print), an article which had a uniformly negative view of system performance in terms of accuracy and usable data production.

Benchmarks are a good thing, but it can be strongly argued that this one was grossly rushed and premature. In particular, while the article doesn't state the chemistry used, the timing on the paper clearly shows this work was performed using the initial shipment of flowcells and kits. This was a chemistry that ONT was open about: not their best, but it was ready-to-ship. The second set of flowcells delivered their next chemistry, and the next set will have a tweak on that. Oxford is rapidly iterating on every aspect of their platform: library preparation, protocols, flowcells and base callers, but this paper is based on the very first version.

More troublesome is a conclusion section which essentially states that while sequencers have historically seen huge advances in throughput, the accuracy doesn't change much over time. That, in my mind, is grossly overstated, and indeed the second batch of flowcells delivered much improved performance versus the first ones. In particular, accuracy on the MinION is greatly affected by whether the system can read one or both strands of the library molecule, and nobody in the first round saw good production of 2D reads (those computed from both strands). This was significantly improved in the second round, and there is good reason to think that further refinement of the library preparation protocols will further improve this. This is a key concept that Mikheyev & Tin ignore in their commentary: library preparation has a much greater effect on sequence quality in the ONT platform than in most other sequencing platforms.

Mikeheyev and Tin also make some projections of throughput, and come to the conclusion that an absurd number of devices would be required to resequence E.coli -- which is in stark contrast to the data Nick Loman presented showing the resequencing and variant-calling on bacterial genomes. As well as several other datasets floating about. Again, a few tweaks to the platform can ratchet the performance by quanta, which can radically reshape ones views of what is practical -- unless those views are fossilized in a print journal.

Herein lies the problem: there is no scientific equivalent of SnapChat. Mikheyev & Tin have written a caustic review of a new technology which is obsolete before it published, but it will be in the record. With luck, it won't be read by many decision makers, but sadly it probably will be. MinION is clearly a disruptive technology in a very early stage, and like the first telephones has highly variable performance. But it also is not a launched technology: it's a bit odd to see a paper that basically says "don't buy X", when you can't buy X unless you sign up for a program that emphasizes that X isn't ready for prime time.

ONT has apparently booted Mikheyev from the MAP, which is hardly surprising given that ONT is essentially giving free product out in exchange for constructive criticism and assistance. Neither of these makes an appearance in the paper. n addition, the Mikheyev publication is technically in violation of the MAP terms, which stated no data release until you signed off that MinION was sufficient for your purposes. Unless your purpose is to slam MinION, this wasn't satisified. Prior advance access programs for genomic technologies have generally involved a select set of elite genome centers and ironclad legal contracts controlling data release and comment. ONT tried a lighter weight approach, relying on good faith and a large crowd. Sadly, a likely outcome of this affair is that future access programs, ONTs or others, are likely to head back towards more lawyers and fewer sites

6 comments:

Anonymous
said...

I like the abstract comment "at best 890±1,932 bases per mapped read". It highlights the rushed nature of the report. Surely "at best" is the upper bound 890+1932=2822? Still, imagine every base in every read mapped. Then their computation averages and computes a variance across reads. So the statement wouldn't represent the variability in the mapping just the variability in the input DNA length. Surely a proportion mapping would better suit? Or would it? If the reads contain large sections that don't map, for whatever reason, then the proportion would again average over non homogenous mappings. What should be reported then? Chunk the data into 500mers and report the mode together with measure of spread?

One bad review won't kill an early access program. A decision to bring in ONT as a possible sequencing technology requires a risk assessment about a nano-tech system: that requires a broad understanding of both the state of the technology, the financial backing provided by ONT, the pain points that sequencing technologies exhibit, and the opportunities that the market provides. One bad review is irrelevant in this evaluation, so I am not worried about its impact. The problem ONT is trying to solve is deep, but its impact would be massive, transformative, and ever lasting. Let's hope the financial backers are in it for the long haul.

Thanks Keith for an informative post. I was surprised to see such a negative paper (as you mention, 'constructive criticism' is absent) published.

Nice to see that there have been many rapid iterations for a platform that is widely watched, and certainly one group's early experience going 'on the record' it can be easily discounted with additional publications with more recent chemistry.

Interesting effect on the library prep's effect on the sequencing. Any thoughts on why this is the case? (I.e. is that same pronounced effect seen on PacBio data, or is this something unique to the nanopore method?)

About Me

Dr. Robison spent 10 years at Millennium Pharmaceuticals working with various genomics & proteomics technologies & working on multiple teams attempting to apply these throughout the drug discovery process. He spent 2 years at Codon Devices working on a variety of protein & metabolic engineering projects as well as monitoring a high-throughput gene synthesis facility. After a brief bit of consulting, he rejoined the cancer drug discovery field at Infinity Pharmaceuticals in May 2009. In September 2011 he joined Warp Drive Bio, a startup applying genomics to natural product drug discovery. Other recurring characters in this blog are his loyal Shih Tzu Amanda and his teenaged son alias TNG (The Next Generation).
Dr. Robison can be reached via his Gmail account, keith.e.robison@gmail.com
You can also follow him on Twitter as @OmicsOmicsBlog.