Active Learning

Active Machine Learning is the next evolution of machine learning and provides the basis for Quantitative Medicine’s Computational Research Engine™ (CoRE™). The CoRE™ employs active machine learning to direct experimentation across drug discovery and development experimental spaces to make predictions that more effectively and efficiently identify the optimal drug candidates.

Unlike any other insilico predictive methodologies or computational methods currently in use, the active learning process driving the Computational Research Engine™ is iterative. In each iteration, its algorithms recommend the most informative experiments to execute and improves the predictive model based on the results. Most current computational methods are developed only to specifically make predictions for the results of unobserved experiments, with little regard for what information might be missing. This can lead to significant errors in predictions which can have substantial costs. In contrast, the CoRE™’s approach does not assume it has initially defined the most relevant research data. However, by directing experimentation, the Computational Research Engine™ will identify the most relevant experiments to execute to significantly improve results and accuracy while lowering costs.

Collaborative Active Learning Analytics – By augmenting the bench science researcher and their experiments with in silico models, we collaborate with them and efficiently explore the experimental space by directing the experimentation. In this way, the CoRE’s analytics achieve greater predictive accuracy than the same effort using only bench science and current predictive analytics.

Big Data Aggregation and Analysis Capabilities – the CoRE™ can accommodate very large amounts of data. In fact, the predictive power and breadth of the model improves with the incorporation of more data. With the CoRE™’s big data capabilities we can:

Supplement clients’ data with our curated data set of over 200 million data points, greatly speeding improvements in the accuracy of predictive models.

Aggregate data from across pharma R&D silos in order to:

Discover relationships that can support research across diverse disease states.

Concurrently work with multiple research campaigns, at different phases of drug discovery.

Polypharmacology Potential – The CoRE™ can develop predictive models from millions of compounds and thousands of targets concurrently. This enables exploration of the complex, polypharmacological relationships required to develop “multi-target drugs” (MTDs). This capability, applied to big data scale research spaces, is an industry break-through!

Simplified Analytics for Go-No Go Decisions – CoRE™ can help researchers decide the, one most informative and cost-effective, next step in the decision making process to progress a drug candidate.

How Active Learning Works

Active learning is a type of advanced machine learning. Machine learning uses algorithms employing statistical methods to analyze a data set and find potential patterns. Inferences derived from those patterns are predictions that are tested with new data. However, unlike traditional statistics, the algorithms can perform millions of calculations, to discovery patterns of relationships in the data. Generally, machine learning methods are very robust, in that they can be applied to a variety of kinds of data, of differing quality and completeness.

Active learning is a type of advanced machine learning. It is an iterative process in which machine learning methods “learn” a predictive model using a small sample of the available data. Depending on the requirements, the machine learning methods used may include graphical models, decision trees, support vector machines, regression methods and density estimation methods.

Next, using this model, a set of recommended experiments is identified. The model chooses the experiments expected to most improve the predictive accuracy of the model. The active learning process repeats, as a new model is “learned” using the prior data and the data newly acquired from the recommended experiments. Iteratively repeating this process generates a more accurate predictive model, requiring significantly less experimentation than other predictive analytics.

Click to View

We have demonstrated the power of this process through simulations using public, as well as pharmaceutical company data. In these simulations, data from previously executed experiments were hidden, and then revealed, as if the experimentation were being directed by an active learning process. In one study, we used a dataset of 177 assays of drug targets and 20,000 potential drug compounds. The proprietary system identified nearly 60% of the hits after exploring only 3% of the experimental space. This is equivalent to finding 60% of the hits in a matrix of 1,000,000 data points, after doing only 30,000 experiments. Other techniques would need 600,000 experiments to find 60% of the hits. These results indicate that significant improvements in R&D efficiency are achievable using active learning.