Machine learning enables syntheses to be planned with unprecedented efficiency

In 1996, when a computer won a match against the then reigning world chess champion Garry Kasparov, it was nothing short of a sensation. After this breakthrough in the world of chess, the board game Go was long considered to be a bastion reserved for human players due to its complexity. Nowadays, however, the world’s best players no longer have any chance of winning against the “AlphaGo” software. The recipe for the success of this computer programme is made possible through a combination of the so-called Monte Carlo Tree Search and deep neural networks based on machine learning and artificial intelligence. A team of researchers from the University of Münster has now demonstrated that this combination is extremely well suited to planning chemical syntheses – so-called retrosyntheses – with unprecedented efficiency. The study has been published in the current issue of the “Nature” journal.

Marwin Segler, the lead author of the study, puts it in a nutshell: “Retrosynthesis is the ultimate discipline in organic chemistry. Chemists need years to master it – just like with chess or Go. In addition to straightforward expertise, you also need a goodly portion of intuition and creativity for it. So far, everyone assumed that computers couldn’t keep up without experts programming in tens of thousands of rules by hand. What we have shown is that the machine can, by itself, learn the rules and their applications from the literature available.”

Retrosynthesis is the standard method for designing the production of chemical compounds. The principle is that, going backwards mentally, the compound is broken down into ever smaller components until the basic components have been obtained. This analysis provides the “cooking recipe”, which is then used for working “forwards” in the laboratory to produce the target molecule, proceeding from the starting materials. Although easy in theory, the process presents difficulties in practice. “Just like in chess, in every step or move you’ve got variety of possibilities to choose from,” says Segler. “In chemistry, however, there are orders of magnitude more possible moves than in chess, and the problem is much more complex.”

This is where the new method comes into play, linking up the deep neural networks with the Monte Carlo Tree Search – a constellation which is so promising that currently a large number of researchers from a variety of disciplines are working on it. The Monte Carlo Tree Search is a method for assessing moves in a game. At every move, the computer simulates numerous variants, for example how a game of chess might end. The most promising move is then selected.

In a similar way, the computer now looks for the best possible “moves” for the chemical synthesis. It is also able to learn by using deep neural networks. To this end, the computer draws on all the chemical literature ever published, which describes almost 12 million chemical reactions. Mike Preuss, an information systems specialist and co-author of the study, summarizes it as follows, in a somewhat simplified way: “The deep neural networks are used for predicting which reactions are possible with a certain molecule. Using the Monte Carlo Tree Search, the computer can test whether the reactions predicted really do lead to the target molecule.”

The idea of using computers to plan syntheses isn't new. “The idea is actually about 60 years old.” says Segler, “People thought it would be enough, as in the case of chess, to enter a large number of rules into the computer. But that didn’t work. Chemistry is very complex and, in contrast to chess or Go, it can’t be grasped purely logically using simple rules. Added to this is the fact that the number of publications with new reactions doubles every ten years or so. Neither chemists nor programmers can keep up with that. We need the help of an ‘intelligent’ computer.” The new method is about 30 times faster than conventional programmes for planning syntheses and it finds potential synthesis routes for twice as many molecules.

In a double blind AB test, the Münster researchers found that chemists consider these computer-generated synthesis routes to be just as good as existing tried-and-tested ones. “We hope that, using our method, chemists will not have to try out so much in the lab,” Segler adds, “and that as a result, and using fewer resources, they will be able to produce the compounds which make our high standard of living possible.”

The work received funding from the German Research Foundation as part of Collaborative Research Centre 858, “Synergetic Effects in Chemistry”.

The authors of the study:

Marwin Segler and Prof. Mark Waller, both chemists, carried out the study together with Dr. Mike Preuss, an information systems specialist, at the University of Münster. Segler is a doctoral student at the Institute of Organic Chemistry, and Waller now works at the University of Shanghai in China. Preuss is a post-doc at the Institute of Information Systems and is an expert on artificial intelligence.