New software has been created that can predict a wide range of reaction outcomes but is also more flexible than other programs when it comes to dealing with completely different chemical problems. The machine-learning platform, which uses structure-based molecular representations instead of big reaction-based datasets, could find diverse applications in organic chemistry.

Although machine-learning methods have been widely used to predict the molecular properties and biological activities of target molecules, their application in predicting reaction outcomes has been limited because current models usually can’t be transferred to different problems. Instead, complex parameterisation is required for each individual case to achieve good results. Researchers in Germany are now reporting a general approach that overcomes this limitation.

‘Previous models for accurately predicting reaction results have been highly complex and problem-specific,’ says Frank Glorius of the University of Münster, Germany, who led the study. ‘They are mostly based on a previously gained understanding of the underlying processes and cannot be transferred to other problems. In our approach, we use a universal representation of the involved compounds, which is solely based on their molecular structures. This allows for a general applicability of our program to diverse problem sets.’

The new tool is based on the assumption that reactivity can be directly derived from a molecule’s structure and uses an input based on ‘multiple fingerprint features’ as an all-round molecular representation. Frederik Sandfort, who also participated in the research, explains that organic compounds can be represented as graphs on which simple structural (yes/no) queries can be carried out. ‘Fingerprints are number sequences based on the combination of many such successive queries,’ he says. ‘They have originally been developed for structural similarity searches and were proven to be well-suited for application in computational models. We use a large number of different fingerprints to represent the molecular structure of each compound as accurately as possible.’

Glorius points out that their platform is very versatile. ‘While our model can be used to predict molecular properties, its most important application is the accurate prediction of reaction results,’ he says. ‘We could predict enantioselectivities and yields with comparable accuracy to previous problem-specific models. Furthermore, the model was applied to predicting relative conversion based on a high-throughput data set which was never tackled using machine learning before.’

The program is also easy to use, the researchers say. ‘It only requires the input data in a very simple form and some problem-specific settings,’ explains Sandfort. He adds that the tool is already online and will be updated further with the team’s most recent developments.

Robert Paton at Colorado State University and the Center for Computer Assisted Synthesis, US, who was not involved in the study, notes that machine-learning methods are being increasingly used to identify patterns in data that can help to predict the outcome of experiments. ‘Chemists have managed to harness these techniques by converting molecular structures into vectors of numbers that can then be passed to learning algorithms,’ he says. ‘Representations using information only from a molecule’s atoms and their connectivity are agnostic to the particular reaction and as a result may be used across multiple reaction types for different types of predictions. Future developments in interpreting these predictions – a challenge shared by all machine learning approaches – will be valuable.’