Recent publications

Journal of chemical information and modeling, 19 March 2018

GuacaMol: Benchmarking Models for De Novo Molecular Design

De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new model shave not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardise the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardised benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multi-objective optimisation tasks. The benchmarking framework is available as an open-source Python package

Machine Learning for Molecules and Materials, NeurIPS 2018 Workshop

Generating novel molecules with optimal properties is a crucial step in many industries such as drug discovery. Recently, deep generative models have shown a promising way of performing de-novo molecular design. Although graph generative models are currently available they either have a graph size dependency in their number of parameters, limiting their use to only very small graphs or are formulated as a sequence of discrete actions needed to construct a graph, making the output graph non-differentiable w.r.t the model parameters, therefore preventing them to be used in scenarios such as conditional graph generation. In this work we propose a model for conditional graph generation that is computationally efficient and enables direct optimisation of the graph. We demonstrate favourable performance of our model on prototype-based molecular graph conditional generation tasks.

Adjusting for Confounding in Unsupervised Latent Representations of Images

Biological imaging data are often partially confounded or contain unwanted variability. Examples of such phenomena include variable lighting across microscopy image captures, stain intensity variation in histological slides, and batch effects for high throughput drug screening assays. Therefore, to develop "fair" models which generalise well to unseen examples, it is crucial to learn data representations that are insensitive to nuisance factors of variation. In this paper, we present a strategy based on adversarial training, capable of learning unsupervised representations invariant to confounders. As an empirical validation of our method, we use deep convolutional autoencoders to learn unbiased cellular representations from microscopy imaging.

Machine Learning in Health Workshop, Neurips 2018

In this work, we provide a new formulation for Graph Convolutional Neural Networks (GCNNs) for link prediction on graph data that addresses common challenges for biomedical knowledge graphs (KGs). We introduce a regularized attention mechanism to GCNNs that not only improves performance on clean datasets, but also favorably accommodates noise in KGs, a pervasive issue in real-world applications. Further, we explore new visualization methods for interpretable modelling and to illustrate how the learned representation can be exploited to automate dataset denoising. The results are demonstrated on a synthetic dataset, the common benchmark dataset FB15k-237, and a large biomedical knowledge graph derived from a combination of noisy and clean data sources. Using these improvements, we visualize a learned model's representation of the disease cystic fibrosis and demonstrate how to interrogate a neural network to show the potential of PPARG as a candidate therapeutic target for rheumatoid arthritis.

Future Medicinal Chemistry, 13 Aug 2018

Artificial intelligence in drug discovery

Matthew A Sellwood, Mohamed Ahmed, Marwin HS Segler & Nathan Brown

There has been a great deal of hype surrounding the resurgence of Artificial Intelligence and Machine Learning. This commentary was published in Future Medicinal Chemistry as a brief overview of the AI and ML domains, their relevance in different aspects of drug discovery and, importantly, reflecting on managing expectations from different quarters. The key themes covered are molecular design approaches, including our recent paper on do novo design models, predictive modelling, synthesis planning, and closing the feedback loop to learn from our decisions.

british medical journal, 7 june 2018

Clinicaltrials.gov is the world’s largest primary registry of clinical studies. For almost two decades now it has been helping physicians, patients, and regulators identify relevant trials and collect evidence. It also offers a unique opportunity to explore, examine, and monitor the clinical research landscape. In our recent research paper, we used the clinicaltrials.gov registry data to conduct a comprehensive large-scale analysis of registered clinical trials and investigate trends in their design and transparency.

Chapter Five - Big Data in Drug Discovery

Modern scientific discovery is driven by data and learning from those data. This book chapter offers an overview of available data sources of relevance to drug discovery and how these can and do make an impact in our research and predictions to make better informed decisions that more rapidly make changes in our discovery research ethic to progress drugs to the clinic.

Nature Chemistry, 4 April 2018

Organic synthesis provides opportunities to transform drug discovery

Ian Churcher et al

Ian Churcher, VP Drug Discovery recently published a paper in Nature to highlight how organic synthesis could represent an opportunity for the pharmaceuticals industries to improve drug development. He presents the current challenges that the industry needs overcome and explains how new technologies and industry-academia collaborations are essential to progress.

Nature, 28 March 2018

Planning chemical syntheses with deep neural networks and symbolic AI

Marwin Segler et al

The AI technology developed by Marwin uses deep neural networks to learn from every chemical reaction ever performed (12.4 million of them). Combined with modern tree search algorithms, this allows to plan the synthesis of novel molecules. The technology augments the ability of chemists to make molecules faster, increases the success rate of synthetic chemistry and the speed and efficiency of drug development in general.

OPEN REVIEW, ICLR 2018, 27 March 2018

The essence of molecular design is to effectively fulfill a molecular property profile that is desirable as a drug. In this paper we consider a number of different generative models for the design of new molecular structures the satisfy specific multiple objectives that are desirable for a particular drug discovery project. In addition to the evaluation of multiple generative models, we also presented as part of this work a benchmarking dataset to the community with the aim to provide an objective set to evaluate other new de novo molecular design models appropriately

ChemMedChem, 20 March 2018

Special Issue: Cheminformatics in Drug Discovery

Andreas Bender, Nathan Brown

BenevolentAI guest edited a special issue of ChemMedChem in early 2018 with our Head of Cheminformatics, Nathan Brown, in collaboration with Andreas Bender at the University of Cambridge. The special issue consisted of twenty original research papers from leading names in the field and was introduced with a guest editorial written by Nathan and Andreas, introducing the content. The special issue covered a broad range of topics in Cheminformatics from recent work in Machine Learning in Drug Discovery, to large scale data analyses of protein structures and ligand binding.