Discovering and connecting data to build new algorithms, AI and machine learning approaches

Leveraging the UK’s renowned leadership in drug discovery – a modern approach to R&D

Virtual R&D is the delivery of medicines R&D using multiple best-in-class partners, ensuring the critical value-based experiments are done by the right people, data is captured, and IP-estate is secured.

Brokering easier access to consented patient data and samples

The UK has millions of samples and billions of data points collected from UK patients who have agreed that their samples and data can be used for research. But small UK medical research companies struggle to access them.

Is it AI, or is it good old-fashioned statistics?

The power of Artificial Intelligence to drive cars, accurately diagnose diseases, recognise cats on the internet, and augment human faces with cartoon rabbit ears is driving a huge wave of interest in the field – Including how it can enable drug discovery. Investment inevitably tends to follow, and consequently the tendency to rebrand many techniques as “AI” and use the sledgehammer of deep neural networks in situations where simpler, faster techniques can work equally well.

Precisely what constitutes AI sparks debate and quickly reaches philosophy, but how does the current wave of developments differ from previous advances in early stage drug discovery at a practical level?

Machine learning and statistics have been used for decades in drug discovery. Z statistics run through many aspects of assay design and underlie tests for the power of an assay to identify active compounds. Use of Bayesian machine learning approaches, and others, to predict compound activities by learning from the measured activities of similar molecules is well established.

All these methods existed before the current AI boom took place, and before logistic regression and automatically applying a confidence threshold became “AI”. Is there really anything new?

Perhaps the distinguishing feature of recent AI developments in early stage drug discovery, compared with machine learning routinely applied in the 1990s / 2000s, is the ability to automatically extract the features from the data that are important for explaining the problem in hand. Traditionally, a user would heavily process the data to reduce the data to defined features that were expected to describe the problem. Different algorithms might be run on images to detect edges or identify particular shapes and the results analysed and passed to the machine learning to classify the image. Molecules were reduced to sets of substructures that were expected to represent the important features and explain the data.

Now, deep neural networks are able to perceive key features that explain the data themselves with far less bias from human input. Raw representations of molecules, plain text and images can be fed directly to the AI, sometimes with astonishing results.

After training classifiers on 50-100 documents, large numbers of documents can be prioritised for interest automatically. The rules of chemistry can be learned by a neural network from 50,000 diverse molecules simply represented as plain text and large numbers of novel molecules generated to propose synthetic ideas. Binding sites on protein surfaces and interactions with ligands can be characterised without the traditional simplistic identification of hydrogen bonds and hydrophobic interactions.

Examples where not just a single step but an entire process can be performed by an AI are now emerging. Entire multistep chemical syntheses can be designed with quality approaching that of a skilled human.

The new developments in AI are already finding practical uses in drug discovery and their application will grow, particularly where complex non-linear relationships in the data exist, and where data is difficult to reduce to defined features. Random Forests, XGBoost, Naïve Bayes and other approaches remain highly competitive in many situations though, and represent a high bar for the newer developments to beat.

About the author

Dr Andrew Pannifer is Lead Scientist in Cheminformatics at Medicines Discovery Catapult.

After a PhD in Molecular Biophysics at Oxford University, mapping the reaction mechanism of protein tyrosine phosphatases, he entered the pharmaceutical industry in 2002. Firstly at AstraZeneca and then at Pfizer, he performed structure-based drug design and crystallography, and in 2010 joined the CRUK Beatson Institute Drug Discovery Programme to start up Structural Biology and Computational Chemistry.

Sign up for our newsletter

First Name

Last Name

Email

Complete this form and we'll send you our monthly newsletter, and occasional alerts when we have major announcements and collaboration opportunities. Every email we send will include a link to unsubscribe, and after 12 months we'll ask you if you want to stay opted-in. We won't share your data with any third-party. For more details, read our Privacy Policy

To comply with EU directives we now provide detailed information about the cookies we use. To find out more about cookies on this site, what they do and how to remove them, see our information about cookies. Click OK to continue using this site.OkRead more