D&S fellow Sorelle Friedler’s latest research appears in Nature. She and her team document their creation of a machine-learning algorithm that accurately predicts new ways to make crystals. The team trained the algorithm using data from both successful and “unsuccessful” experiments and trials.

Abstract:

Inorganic–organic hybrid materials such as organically templated metal oxides, metal–organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative provide an alternative to experimental trial-and-error.

Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation to identify promising target candidates for synthetic efforts; determination of the structure–property relationship from large bodies of experimental data, enabled by integration with high throughput synthesis and measurement tool; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties. Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on ‘dark’ reactions—failed or unsuccessful hydrothermal syntheses—collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted conditions for new organically templated inorganic product formation with a success rate of 89 per cent. Inverting the machine-learning model reveals new hypotheses regarding the conditions for successful product formation.

Most Chemical reactions that are performed are never reported because they are deemed “unsuccessful”. Normally this is because they do not yield sufficient (or any) product, or do not do so to a required level of purity. Nevertheless, such data are important because they define bounds on the space of successful reactions. Moreover, they are important to the understanding of the physical parameters that govern those chemical reactions. This project seeks to use historical synthesis data to train machine learning models in order to make better hypotheses and predictions about the success of reactions ahead of time.