Machine-learning algorithms that can predict reaction yields have remained elusive because chemists tend to bury low-yielding reactions in their lab notebooks instead of publishing them, researchers say. ‘We have this image that failed experiments are bad experiments,’ says Felix Strieth-Kalthoff. ‘But they contain knowledge, they contain valuable information both for humans and for an AI.’

Strieth-Kalthoff from the University of Toronto, Canada, and a team around Frank Glorius from Germany’s University of Münster are asking chemists to start including not only their best but also their worst results in their papers. This, as well as unbiased reagent selection and reporting experimental procedures in a standardised format, will allow researchers to finally create yield-prediction algorithms.

Retrosynthesis is already using machine-learning models to create shorter, cheaper or non-proprietary synthetic routes. But there have been few attempts at creating programs that predict yields. Most of them require researchers to first produce a custom dataset of high-throughput experiments.

‘What would of course be ideal is that … we just take the data that is there, the one in the literature,’ says Strieth-Kalthoff. But doing this for popular reactions like Buchwald–Hartwig aminations and Suzuki couplings generated algorithms that were so inaccurate ‘we could have pretty much just guessed the average [yield] of the training distribution’.

The team showed that while machine-learning algorithms are rather robust to experimental errors – like yield fluctuations due to scale – they are deeply affected by human biases. ‘The whole chemical space and the space of reaction conditions is very broad, but we tend to always do the same thing,’ says Strieth-Kalthoff. This is further reinforced by which chemicals are cheapest and most available. ‘But the factor that we figured out is even more important is that we don’t report all the experimental results that we have.’

Compounding errors

The researchers trained an algorithm on a dataset of high-throughput reactions. When they removed many of the low-yielding examples, the AI’s yield prediction error increased by more than 50% compared with using the entire unaltered dataset. A 30% error increase occurred when biasing the training data to only use specific reagent combinations. When the team deliberately introduced experimental errors into the dataset’s yields, prediction errors remained under 10%.

Adding fake negative data – random reagent combination assigned at 0% yield – actually increased the algorithm’s prediction accuracy. ‘We don’t know what the real yield is [of these reactions], and we might well have introduced some small error, but this strategy actually shows a bit of promise,’ explains Strieth-Kalthoff. ‘But I would, at this stage, not see this as the solution but rather as an emphasis on how important negative data is.’

‘It’s a nice way to bring awareness to the different considerations one should make when we think about using existing reaction data for different types of machine learning for predictive chemistry tasks,’ says Connor Coley who works on computer-assisted chemical discovery at the Massachusetts Institute of Technology, US. The problems data limitations create are well-known within the machine-learning community. But with more chemists from experimental backgrounds starting to use AI tools ‘I think that it’s good to ensure that these topics are being thought about’.

‘I think, more broadly, in the literature, I would not say that [omitting low-yielding reactions] is the only problem or even necessarily the main limitation,’ Coley points out. A big problem, he says, is that literature data is often missing information or is hidden inside text documents. Factors like the order in which reagents are added or whether the mixture is stirred can be crucial.

Raising standards

Reporting all of these details – and in a standardised format – would not only help computers but also human chemists. ‘I think many have probably wasted hours or days trying to replicate a reaction that they have read in a paper,’ Coley says, only to later find out that something as simple as oven-drying the flask made all the difference.

Last year, Coley was part of a team that created the Open Reaction Database. This open-access repository allows organic reaction data to be captured in a structured, machine-readable way. While this is a step towards addressing the technical barriers to data-sharing, there’s also cultural barriers, Coley says. ‘We have to actually change the way that people choose to report their data, to use these more structured formats and to be willing to share what they consider to be negative examples.’

There are good reasons not to report some failed experiments: they may be the start of a new project you don’t want to be scooped on, for example. But omitting all the 0% yield reactions may just leave other chemists to duplicate effort needlessly, says Strieth-Kalthoff.

Sometimes though it’s difficult to find out whether reactions fail because of setup errors or because of inherent reactivity, Coley says. ‘Automation, high-throughput experimentation, standardisation of procedures will all help with that.’

Coupling automation with AI would also take some of the drudgery out of lab work. ‘What I hated most about method development, is sitting in front of the balance and weighing in the 40th catalyst to try,’ Strieth-Kalthoff laughs. ‘If we have robotic automated systems to do that, then chemists can really more focus on the higher-level tasks like directing the models into the right direction and finding the right research problems.’