New machine learning algorithm can help search new drugs

It is possible to build a statistical model by searching for chemical patterns shared among molecules known to activate that process, but the data to build these models is limited because experiments are costly and it is unclear which chemical patterns are statistically significant.

Researchers say they have developed a machine learning algorithm for drug discovery which is twice as efficient as the industry standard, and could accelerate the process of developing new treatments for diseases such as Alzheimer’s. The team led by researchers at the University of Cambridge in the UK used the algorithm to identify four new molecules that activate a protein thought to be relevant for symptoms of Alzheimer’s disease and schizophrenia. A key problem in drug discovery is predicting whether a molecule will activate a particular physiological process, according to the study published in the journal PNAS.

It is possible to build a statistical model by searching for chemical patterns shared among molecules known to activate that process, but the data to build these models is limited because experiments are costly and it is unclear which chemical patterns are statistically significant. Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed “Machine learning has made significant progress in areas such as computer vision where data is abundant,” said Alpha Lee from Cambridge’s Cavendish Laboratory.

“The next frontier is scientific applications such as drug discovery, where the amount of data is relatively limited but we do have physical insights about the problem, and the question becomes how to marry data with fundamental chemistry and physics,” said Lee. The algorithm developed by researchers, in collaboration with biopharmaceutical company Pfizer, uses mathematics to separate pharmacologically relevant chemical patterns from irrelevant ones. The algorithm looks at both molecules known to be active and inactive, and learns to recognise which parts of the molecules are important for drug action and which parts are not.

A mathematical principle known as random matrix theory gives predictions about the statistical properties of a random and noisy dataset, which is then compared against the statistics of chemical features of active/inactive molecules to distil which chemical patterns are truly important for binding as opposed to arising simply by chance. This methodology allows the researchers to fish out important chemical patterns not only from molecules that are active, but also from molecules that are inactive — in other words, failed experiments can now be exploited with this technique.

The researchers built a model starting with 222 active molecules, and were able to computationally screen an additional six million molecules. They purchased and screened the 100 most relevant molecules. From these, they identified four new molecules that activate the CHRM1 receptor, a protein that may be relevant for Alzheimer’s disease and schizophrenia. “The ability to fish out four active molecules from six million is like finding a needle in a haystack,” said Lee. “A head-to-head comparison shows that our algorithm is twice as efficient as the industry standard,” he said.

Making complex organic molecules is a significant challenge in chemistry, and potential drugs abound in the space of yet-unmakeable molecules. The researchers are currently developing algorithms that predict ways to synthesise complex organic molecules, as well as extending the machine learning methodology to materials discovery.