Developing the World's First Single Molecule Protein Sequencer

The "Genomic Era" saw a multitude of advances in methods to analyze the genome, and as such, genetics research has progressed in leaps and bounds since the completion of the Human Genome Project in 2003.

The field of Proteomics, in which the proteome of a cell, tissue or organism is studied, has somewhat lagged behind, namely due to immense technical challenges. The advent of next-generation sequencing protein is a "renewed push towards fundamentally different approaches to identify and quantify every single protein species in complex biological mixtures," Jagannath "Jag" Swaminathan, Erisyon told Technology Networks in a recent interview.

Swaminathan is the co-founder of Erisyon and a researcher at the Marcotte Lab at the University of Texas Austin. Erisyon is drawing on over a decades' worth of proteomics research at the University of Texas to commercialize the world's first single molecule protein sequencer. In this interview, we discuss the concept of next-generation protein sequencing, clinical proteomics and the critical role of protein sequencing in the fight against COVID-19.

Molly Campbell (MC): For our readers that may be unfamiliar, please can you expand on what next-generation protein sequencing is?Jagannath Swaminathan (JS): Next-generation DNA sequencing caused a revolution in our understanding of biology by democratizing access to seemingly unlimited amounts of phenomenally high-quality genomic data. Next-generation protein sequencing is an effort to do the same for proteomics.

To be a bit more specific, next-generation protein sequencing is a renewed push towards fundamentally different approaches to identify and quantify every single protein species in complex biological mixtures. Our technology, fluorosequencing, has adapted the intrinsic features of single molecule optical sensitivity and the massively parallel architecture from the successful next generation DNA sequencing platforms and brings the features of high sensitivity, throughput and digital quantification to proteomics. The implications of being able to study proteins with a similar resolution and fidelity ranges across all of biological research and clinical applications, from being able to better understand the causes of diseases like Parkinson’s and Alzheimer’s to earlier diagnosis and more effective treatment of cancer.

I’ve been involved in proteomics since my undergraduate work in analyzing mass spectrometry (MS) data and co-invented the fluorosequencing technology during my graduate study with Prof. Edward Marcotte. Having worked in a proteomics lab and seen a range of biological questions trying to be answered, it became very clear that protein science is seriously lagging behind genomics, especially when it comes down to identifying low abundant proteins, and more importantly, in quantifying them. Of course, there is a reason why DNA sequencing advanced faster than protein sequencing. The whole point of DNA is to be read and copied; once PCR reactions were discovered we could amplify a low signal incredibly easily. There’s no analogue to that for proteins. Similarly, there are only five bases we need to identify for DNA and RNA interactions, whereas for proteins there are 21 amino acids and a huge number of post translational modifications. These are just a couple of the immense challenges that need to be overcome to deliver a system that meets the performance of a next-generation DNA sequencer.

I’m very proud to say that we have made great progress in overcoming them to produce a working system that delivers on many of the hopes we had when we started on the project. I’m equally excited about the growth of the research community that is tackling this challenge with incredible creativity. In September of last year the 2nd Single Molecule Protein Sequencing conference was held in Jerusalem, and it was really so exciting to see all the approaches people were taking to decipher proteins better and faster. Molly Campbell (MC): Can you discuss the technology in comparison to existing analytical technologies adopted in proteomics, such as mass spectrometry?JS: When we think about how the field analyzes proteins, we usually refer to three approaches: affinity assays, MS and RNA sequencing.

With single molecule protein sequencing, we are trying to borrow the best features from each. Affinity assays are, in many ways, the workhorse of clinical and research biology. From blood tests to pregnancy kits to Western blots, affinity assays generate an astonishing amount of critical data. They’re fast, cheap, and can be exceptionally sensitive, as an affinity reagent like an antibody is typically capturing or measuring a single molecule at a time. Unfortunately, their results are really qualitative, and it’s hard to measure the abundance of multiple protein species in the same assay. Furthermore, affinity assays are biased tests where the only thing that will be found is what you can test for. That makes it very difficult to conduct a general survey on a sample.

MS is the gold standard for protein analysis. It’s truly a marvel of physics that’s been the basis for three Nobel prizes and, in my opinion, probably deserves a fourth for the development of the OrbiTrap. It’s incredibly precise, unbiased, and can provide abundance measurements for nearly all of the constituent proteins of a sample. However, it’s not without its drawbacks. Chief among them, at least for me, is that MS requires enough homogenous sample to fill its chamber which is roughly about 1,000,000 molecules. That’s actually incredibly limiting as there are so many crucial applications where the sample concentration is well below that baseline. For instance, in order to identify the antigens expressed on a tumor sample, we’ve found that we need somewhere between 100M and 1B cells. That’s orders of magnitude more than is available from a clinical tumor biopsy, something like carving out a quarter of your pancreas for a test!

Finally, RNA sequencing has become one of the most popular approaches for protein analysis. It is, of course, exquisitely sensitive, massively parallel and provides explicit quantification of every molecule in a sample. Unfortunately, it’s an indirect measurement of protein expression whose correlation with actual protein measurements is a source of substantial debate. The fluorosequencing technology is our attempt to merge the best of these techniques into a single platform: the specificity of affinity assays, the unbiased analysis of MS and the sensitivity and quantified nature of RNA sequencing. MC: How might the capabilities of a single molecule protein sequencer advance biopharmaceutical research?JS: We believe that there is significant and abundant opportunity for single molecule protein sequencing in biopharmaceutical research. We are light years ahead of where we were twenty years ago in understanding health and disease because of the ubiquitous success of next-generation DNA/RNA sequencing. Our hope is that single molecule protein sequencing can propel our understanding in a similar way.

As an example, I’ve already mentioned the problem in identifying antigens from clinically relevant sample concentrations. This is a real challenge in the field of immuno-oncology where it's crucial to know the identity of the target for any T-Cell Receptor (TCR) therapy that is to be employed. Since the sensitivity of direct detection of MS is well above what is recovered from a clinical sample and affinity assays are, at best, qualitative, the current solution is to infer the data using sophisticated machine learning algorithms based on the genomic and exomic signatures of the patient’s tumor. While these approaches have delivered remarkable results, they have fundamental limitations, and therefore introduce clinical risks, because they don’t directly detect the antigenic target of the therapy.

Single molecule protein sequencing can intervene here by helping researchers get better understanding of what, when, where and why certain targets are being expressed leading to better clinical developments and outcomes. A similar opportunity exists in the area of neurological and central nervous system diseases. So many of these debilitating illnesses such as Alzheimer’s, ALS, and Parkinson’s are, in many ways, disorders of post-translational modifications, specifically phosphorylation. One of the hallmarks of Alzheimer’s, for instance, is hyper-phosphorylated tau which is believed to be the reasons for tau’s agglomeration. Unfortunately, affinity assays can only qualitatively target a single phospho-site at a time, MS has a very difficult time detecting phosphorylations given their biases to charged state, and by definition, post translational modifications cannot be detected by RNA sequencing.

In this case, single molecule protein sequencing can provide truly novel insights on the progression of these diseases and perhaps provide new therapeutic targets for a field that has recently been dealt some staggering frustrations.

MC: There has been anticipation for several years now as to when proteomics technologies will enter the clinical space as a mainstream analytical and diagnostic tool. In your opinion, how might a single molecule protein sequencer impact this? JS: Well, I do feel compelled to state that proteomics has already been in the clinic for decades! Tools like antibody panels and IHC stains are some of our very best analytical and diagnostic tools and will remain mainstays for some time to come.

That being said, we absolutely see opportunities for single molecule protein sequencing to enter the clinical workflow. We believe that there are abundant opportunities for diagnostics in the areas I’ve already mentioned, immuno-oncology and neurology, but also in liquid biopsies for early detection, patient selection and stratification for better treatment decisions, and long term monitoring of residual disease. To provide a bit more color, it seems every week I read an absolutely beautiful, elegant, and convincing paper that has identified a set of protein biomarkers with an exceptional correlation to a disease state but by the end I’m usually disappointed because the tools and techniques they require for clinical use don’t meet the necessary specifications.

I believe this mainly comes down to two challenges - (a) ability to work with clinically relevant sample amounts and (b) being truly quantitative. Quantitation is an important aspect in clinical research as measuring different forms of the same proteins or its levels across multiple orders of magnitude (especially when sample amounts are low) are currently challenging with contemporary methods. Translating the correlates observed in research labs to clinical settings mandates that these correlates turn into strong clinical correlates.

This is why we believe single molecule protein sequencers with its ability to handle clinical samples and provide absolute quantification for a large set of proteins will be the workhorse in the clinical space. Our sincere hope and belief is that that once the clinical community has access to single molecule protein sequencing they will be able finally take advantage of their creative breakthroughs for the benefit of their patients. MC: Eriyson is working on the first single molecule protein sequencer. Can you please tell us more about the current status of this technology? JS: Erisyon is working hard to commercialize the proof of concept that we developed in our lab and described in the Nature Biotechnology paper. The company is currently diligently designing the instrument along with its upstream sample prep and downstream analytics. A publication describing the novel proteomics sample preparation procedure is currently in review and has been published on Biorxiv. We are also working with collaborators in different fields of research to propel new applications.

MC: Please can you expand on the critical role of protein sequencing in the fight against COVID-19? What research have you conducted in this space?

JS: To answer the role of protein sequencing - in my opinion, COVID-19, beautifully illustrates the problems which the DANA sequencing technologies address (like patient testing) and what proteomics could solve (vaccine development and monitoring of treatment or infection). For instance, it was essential that the structure of the spike-protein was solved and its various post-translational modifications (such as glycosylations) mapped before any therapeutic development, such as a vaccine, could be started.

Techniques such as MS and CRYO-EM were used to elucidate the structure and aid in the development and utilized recombinantly engineered tissues to produce the proteins in large amounts in the labs. However, shortening the timeline to create and engineer effective antibodies is of utmost importance today and this is perhaps where I see new technologies for protein sequencing fit right in.

Directly cataloguing the diversity of the spike-proteins and its various post-translational modifications from patient samples instead of relying on tissue cultures in vitro will produce a more realistic understanding of the virus in the wild and thus provide the best information for therapeutic development and reducing the time and effort during clinical trials.