Interpretation of material spectra can be data-driven using machine learning

Spectroscopy techniques are commonly used in materials research because they enable identification of materials from their unique spectral features. These features are correlated with specific material properties, such as their atomic configurations and chemical bond structures. Modern spectroscopy methods have enabled rapid generation of enormous numbers of material spectra, but it is necessary to interpret these spectra to gather relevant information about the material under study.

However, the interpretation of a spectrum is not always a simple task and requires considerable expertise. Each spectrum is compared with a database containing numerous reference material properties, but unknown material features that are not present in the database can be problematic, and often have to be interpreted using spectral simulations and theoretical calculations. In addition, the fact that modern spectroscopy instruments can generate tens of thousands of spectra from a single experiment is placing considerable strain on conventional human-driven interpretation methods, and a more data-driven approach is thus required.

Use of big data analysis techniques has been attracting attention in materials science applications, and researchers at The University of Tokyo Institute of Industrial Science realized that such techniques could be used to interpret much larger numbers of spectra than traditional approaches. "We developed a data-driven approach based on machine learning techniques using a combination of the layer clustering and decision tree methods," states co-corresponding author Teruyasu Mizoguchi.

The team used theoretical calculations to construct a spectral database in which each spectrum had a one-to-one correspondence with its atomic structure and where all spectra contained the same parameters. Use of the two machine learning methods allowed the development of both a spectral interpretation method and a spectral prediction method, which is used when a material's atomic configuration is known.

The method was successfully applied to interpretation of complex spectra from two core-electron loss spectroscopy methods, energy-loss near-edge structure (ELNES) and X-ray absorption near-edge structure (XANES), and was also used to predict the spectral features when material information was provided. "Our approach has the potential to provide information about a material that cannot be determined manually and can predict a spectrum from the material's geometric information alone," says lead author Shin Kiyohara.

However, the proposed machine learning method is not restricted to ELNES/XANES spectra and can be used to analyze any spectral data quickly and accurately without the need for specialist expertise. As a result, the method is expected to have wide applicability in fields as diverse as semiconductor design, battery development, and catalyst analysis.