A computational biologist's personal views on new technologies & publications on genomics & proteomics and their impact on drug discovery

Wednesday, March 04, 2015

Can BGI Really Stir Up the Sequencing Instrument Market?

I've been asked several times recently about rumors coming out from BGI. They've started claiming they have a super sequencer which will radically beat Illumina's offerings on both cost and accuracy. The recent 10K Genomes meeting apparently had a quick talk from BGI which led to some limited Twittering, and judging from this Mendel's Pod interview at least one person believes the buzz (though the same individual quotes a price per PacBio human genome that high by at least a factor of 25). . The claim is that this summer at ESHG BGI will release two boxes, one a benchtop model which I haven't seen any details on, and the other claimed to offer throughput superior to a HiSeq with better accuracy. What might be backing up these claims?

The most obvious speculation would be technology based on Complete Genomics (CG). CG had gone for a model of building factories rather than shippable instruments. If this is the core technology, then a bunch of design decisions probably needed to be revisited, but that'life in the engineering world. If BGI is really intending on selling boxes, rather than sequencing-as-a-service, they'll also have to deal with shipping their consumables around the world, a difficult task as Oxford Nanopore has discovered. Virtually everything imaginable can go wrong: boxes are crushed, customs holds up shipments, things are shaken, novel interpretations are read into protocols, etc. It is very hard to project a technology across the world!

CG's technology relied on sequencing-by-ligation, with the innovative "rolony" (aka nanoball) template amplification. Rather than using emulsion PCR (454, SOLiD, Ion) or bridge amplification PCR (Illumina), CG's approach uses a localized rolling circle amplification. The Polonator was also going in this direction, so QIAGEN's long-delayed box may as well. High throughput sequencing's efficiency arises from deriving increasing amounts of data from a given amount of pricey reagents. While read length increases have contributed to this, the major route to throughput gains has been to increase the number of templates sequenced in parallel. This also has the advantage of increasing efficiency for a fixed unit of time; read length increases for a given platform always engender longer run times. Sometimes those longer times can be compensated by improvements elsewhere, but it will always be the case that a given chemistry can run half the readlength in half the time.

Even back when I was a graduate student, George Church was emphasizing that the ultimate goal for any image-based sequencing system would be to read one base per imaging pixel. If BGI can use nanoballs to achieve far greater densities than the current class of Illumina instruments, then they might have an angle. However, Illumina is full of clever people and it would be a mistake to count them out. Who would have thought the same chemistry that a decade ago delivered 25bp reads could do 10 times that, while consistently increasing accuracy and reducing cycle times. 2x100 in a week and a half at the early part of this decade was amazing; getting 2x250 in shorter time on more templates far more so. So I think it is foolish to assume that Illumina has little running room left in their platform. Given some fierce competition, it is likely that the San Diegans will be willing to take bigger risks.

For example, Illumina has achieved high cluster densities on their top-of-the-line HiSeqs (first the X series, now the 4000) with patterned flowcells. Rather than relying on random processes to lay clusters out on the surface, the flowcell has nanoengineered wells, each of which can amplify only a single template by a process termed exclusion amplification. Since the pattern is defined and regular, cluster localization is much simpler. Furthermore, clusters cannot grow into each other, which should reduce the variability in cluster size between densely clustered and less densely clustered regions. This is intended to enable greater tolerance to variations in the amount of library loaded on the flowcell; if exclusion amplification really works, then the system should be insensitive to overloading.

Last years launch of the X10 was their first foray into this technology, which seemed poised for a gradual rollout to the rest of the line. If BGI launches a benchtop instrument, as some of the rumors hold, that could easily accelerate Illumina pushing this to instruments in the NextSeq and MiSeq class (per my previous speculation). That would defend the low end. On the high end, it would seem unlikely that Illumina was aggressive in the cluster density on the patterned flowcells; etch a greater density of wells and the flowcell can support more reads. Obviously, there are a lot of issues involved -- the wells will be closer together and smaller, meaning more strain on the biochemistry and less tolerance for manufacturing defects in the smaller features. Smaller wells with smaller clusters mean less signal.

Complete Genomics has historically had truly short reads, I think more in the 35-50 range than the 100+ that passes for short in the sequencing world. While that can be more of problem for downstream mapping, as the industry demonstrated a long time ago one can get a lot of mileage from short reads. BGI/CG may have decided that very small features (the nanoballs) were the right route, offering very high numbers of templates sequenced. Small means less signal and more noise, particularly as reads "dephase" when whatever interrogation process (sequencing by synthesis, sequential ligation) fails to work in perfect unison on all the molecules in a cluster. Short reads may have been the price to pay for extreme density.

Could Illumina go the same route and abandon hard-won read length gains in favor of extreme density? There's the engineering question and then the business question. On the engineering side, if clusters can support 250 cycles of chemistry they must be very bright at the beginning, but going for extremely high densities would engender challenges both for positioning the clusters densely and then imaging them. Solexa/Illumina I think always envisioned very high densities, but the market pulled them in a different direction. That gets to the business side: could Illumina veer sharply off their chosen path, going for very high densities with short reads, without major disruption of their commercial message?

Sequencing-by-ligation potentially offers an accuracy advantage, as ligases are extremely finicky about insisting on correct base-pairing at the ligation junction. This can certainly be enhanced via protein engineering and/or directed evolution. Furthermore, with a ligation approach the unnatural labels can located far away from where the enzyme is focused, whereas with a modified nucleotide the polymerase is confronted with the oddity. Hence, every sequencing-by-synthesis platform has also worked on engineering polymerases to be as blind as possible to the fluorescent labels, or at least color-blind so that none is preferred over another.

Will BGI be able to deliver on their promises in time to ruin a past prediction of mine? I'd love to see it, but it would hardly be surprising if the announcement in June fills in some details, but one of those details is a much later launch of the system. Even so, the possibility of a short read platform competitive with Illumina should make nearly anyone in the market happy. I even think that Illumina would relish going head-to-head with a serious rival. Whether that will emerge from Shenzhen remains to be seen.

8 comments:

I do hope they have something decent though. It might make dealing with Illumina more bearable. I'm not sure how many of us would be interested in anything shorter than 100bp. That seems more of a step back than a step forward.

i don't beleive they will sell the instrument around the world, they don't have a large group of sale, engineer to support it. There is a huge need for the chinese market for using sequencer so they will just become dominated in the service industry, but it is very excited to see another technology!

AT the VIB conference in February, the CSO of Complete Genomics presnted on these and mentioned two key things:1) Their LFR (Long Fragment Read) methods, using 10kb-1Mb - published in Frotiers in Genetics http://journal.frontiersin.org/article/10.3389/fgene.2014.00466/abstract2) the claim they can sequence 18 genomes per run - 60 billion reads/3Tb per run (98% exome bases called at error rate of 1/Mb)with target of $1/Gb.

On some level it seems kind of preposterous that BGI/CG can come up with something that would immediately wipe the floor versus Illumina sequencers. Aside from breaking the laws of physics regarding imaging speed and quality, it is really telling that there have been little reports in the media and scientific crowds about the technology. Unless the culture of BGI is to be incredibly secretive -- moreso than that of Apple -- the lack of news is not a great sign.

Didn't Illumina just claim their NextSeq V2 chemistry is as good as HiSeq? If that's true and the next iteration of HiSeq X uses 2-color chemistry, the throughput can at least double. So no worry for Illumina for now.

It's interesting to consider how in situ sequencing will play out in this context. With in situ sequencing we can already fill the cellular space completely with rolonies (which are approximately diffraction limited in size), such that our imaging is very efficient (~4-8 pixels per rolony, with no "dark" space). Since our reads are single molecule by nature, no throughput is wasted sequencing PCR clones. Because we image in 3D, the sequencing time is very efficient (we can sequence an Illumina-sized flow cell in XY dimensions, but also 25 or 50 um in Z simultaneously). It is also possible to combine RNA-seq with genomic sequencing simultaneously, such that over a moderate number of cells one could achieve genomic coverage, while also getting thousands of RNA-seq data-points per cell. Spatial information in sequencing is an interesting new frontier.

By the way I'm a long-time reader of your blog, but first time commenting. I'm one of the first authors on the in situ sequencing paper published in Science last year from George Church's lab.

Follow by Email

Search This Blog

About Me

Dr. Robison spent 10 years at Millennium Pharmaceuticals working with various genomics & proteomics technologies & working on multiple teams attempting to apply these throughout the drug discovery process. He spent 2 years at Codon Devices working on a variety of protein & metabolic engineering projects as well as monitoring a high-throughput gene synthesis facility. After a brief bit of consulting, he rejoined the cancer drug discovery field at Infinity Pharmaceuticals in May 2009. In September 2011 he joined Warp Drive Bio, a startup applying genomics to natural product drug discovery. Other recurring characters in this blog are his loyal Shih Tzu Amanda and his teenaged son alias TNG (The Next Generation).
Dr. Robison can be reached via his Gmail account, keith.e.robison@gmail.com
You can also follow him on Twitter as @OmicsOmicsBlog.