3. Survey the Sequence Space: Test Infologs in a commercially relevant assay, and quantify relative variable contribution. Results allow us to deconvolute how substitutions within a protein sequence modify its function.

4. Map the sequence space: Establish a sequence-function model from the assay results and cross validate. Models are assessed on their predictive value.

Repeat: Explore new sequence space, using the model to design a new set of systematic variants.

Results: Select the best functionally improved variant(s) after multiple rounds of the above.

Advantages

Optimize directly for function in the final application

Save years of time and millions of dollars

No high-throughput (HTP) screens

Screen small numbers of variants (50-200) directly for the desired function, ideal for Enzyme Engineering

Don’t waste time pursuing false positives: variants identified by HTP screens that do not retain activity in ‘real’ assay

No false negatives lost due to screening error or poor correlation between HTP screen and ‘real’ assay

No biodiversity collections required, everything is synthesized as needed

Typical protein engineering methods rely on screening a high number (106-1012 or more) of gene variants to identify individuals with improved activity using a surrogate high throughput screen (HTP) to identify initial hits. Unfortunately, you get what you screen for as the “hit” from the HTP screen often has very little real activity in a lower throughput assay more indicative of the improved functionality for which the protein is being developed.

ProteinGPS™ instead relies on identifying key amino acid substitutions through bioinformatics-based mining of available sequence space and combining such substitutions in an information maximized variant dataset (usually less than 100 unique gene variants). At that scale determining the activity for the commercially relevant function in an indicative assay can be readily performed. DNA2.0 then uses advanced machine learning algorithms to deconvolute the relative contributions of each substitutions to map the megadimensional sequence space contributing to the desired protein activity. We routinely see orders of magnitude functional improvement by measuring no more than 100-300 samples.

Applying Modern Engineering Principles

The bioengineering technology developed by DNA2.0 is based on mathematical nonlinear systems modeling and optimization algorithms routinely used in such diverse areas as small molecule QSAR, process control design for manufacturing, website optimization, and logistics. These problems all require methods that can analyze systems with high complexity and large numbers of independent impactful variables. Over the past seventy years, mathematicians and engineers have developed algorithms for identifying optimal solutions from data sets that are very small relative to the total potential information space being interrogated. Today, these principles are used in the development of numerous products, from the design of jet engines to the optimization of gasoline formulations to credit card fraud detection. Methods for multidimensional optimization that are now routinely employed in other engineering disciplines contrast starkly with both structure-based protein design and directed evolution, which have no real parallels in other engineering areas.

Developing Algorithms for Engineering Proteins

At DNA2.0 we have modified the standard algorithms for engineering complex systems to work with biological systems. The resulting process enables us to deconvolute how substitutions within a protein sequence modify its function. We have combined these algorithms with an integrated query and ranking mechanism to identify appropriate sequence substitutions.

From Predicted Sequences to Testable Genes

The conversion of computationally predicted DNA sequences to physically testable genes is powered by our gene synthesis pipeline. Until recently, the synthesis of individually designed genes was prohibitively expensive. As a result, the only practical way to obtain combinatorially modified proteins was to make recombinant libraries, which in turn necessitated high-throughput screens. By instead synthesizing individually designed gene variants, DNA2.0 ensures that amino acid changes are distributed to achieve maximum information content. This in turn obviates the need for high-throughput screening, allowing us instead to focus on measuring protein properties that are important for the final application.

Using independently designed synthetic genes where substitutions are systematically incorporated (Infologs™) leads to uniform sampling, systematic variance and unrestricted information rich results. Wheat GST with the ability to detoxify a panel of common herbicides was designed using this patented DNA2.0 bioengineering method. The relative functional contribution of 60 amino acid substitutions against 14 herbicides was quantified using only 96 infologs and dramatically improved by a small set (16) of 2nd generation infologs. Check out the full “Using Infologs to Engineer Biological Systems” Presentation or the ACS Synthetic Biology Publication.

Researchers at Pfizer and DNA2.0 publish the enzyme engineering of an aminotransferase for the biocatalysis of a key chiral intermediate in the synthesis of imagabalin, an advanced anxiolytic drug candidate. The starting wt protein, Vfat, is an ω-amino acid:pyruvate transaminase with very weak but detectable catalytic activity toward aliphatic amines. Designing and testing <450 Vfat variants synthesized by DNA2.0 resulted in an aminotransferase optimized for substrate selectivity and reaction velocity sufficient for the commercial biocatalysis goal.

ProteinGPS Engineering Overview

Webinar – ProteinGPS Engineering via Systematic Exploration of Space

Learn more about Protein Engineering and Infologs. ProteinGPS relies on identifying key amino acid substitutions through bioinformatics-based mining of available sequence space and combining such substitutions in information maximized Infologs – synthetic gene variants designed to be systematically varied across the searched space.

Using a small set of variants to explore the sequence space systematically can help us understand the effects of substitutions on the protein activities and further helps us to determine e strategies to explore the sequence space. This can be attained using Machine Learning techniques to analyze the data from a small number of systematically designed variants of the protein, usually on the order of 100 variants. We can address questions related to additivity and multidimensional effects of substitutions on the various properties and activities that can be measured accurately under commercially relevant conditions.

Webinar – Using ProteinGPS and Infologs to Engineer Biological Systems

ProteinGPS™ relies on identifying key amino acid substitutions through bioinformatics-based mining of available sequence space and combining such substitutions in information maximized Infologs – synthetic gene variants designed to be systematically varied across the searched space. The presentation includes recent case studies.

GenomeGPS™ and PathwayGPS™

The bioengineering technology developed by DNA2.0 can also be used to develop completely novel genomes or to optimize pathways.

PathwayGPS™ and GenomeGPS™ build on DNA2.0’s other GPS systems to explore higher order combinations of multiple genes into functionally improved metabolic pathways. Our capability for low-cost, high-capacity gene synthesis enables synthesis of multi-component multi-gene pathways up to several hundred kilobases in size.

Systematic non-correlated variation of control elements such as operators, promoters, and terminators across complex metabolic pathways while simultaneously varying individual genes to cover a range of expression levels, specificity and activity allows for sampling of vast areas of metabolic space. Application of advanced machine learning algorithms then enables a determination of each element’s contribution to pathway efficacy within a multitude of complex, interacting enzyme activities. The elements are then engineered to drive the system to its optimal performance using a minimum number of assays.