Has your research ever been derailed on discovering that your compound was assigned the wrong structure? You’re not alone. Now, an open-source program can get you back on track by showing where in the structure to check for mistakes. Plus, it can be automated for use with high-throughput robotic syntheses.
DP5, developed by Jonathan Goodman and Alexander Howarth at the University of Cambridge in the UK, is an automated 13C NMR structure validator able to tackle tricky cases with the accuracy of an experienced chemist. Integrating the Goodman group’s previous work on DP4-AI, which adds automated NMR processing to the original DP4’s ability to rank a list of potential structures, the new DP5 program works out how likely a single candidate structure is.
‘If an experiment gives a high level of confidence that the structure must be one of a list of possibilities, DP4 is still the best option,’ says Goodman. ‘However, in cases where the structure is unknown or uncertain, DP5 gives information that DP4 cannot.’
‘Incorrect assignment is a major problem. Natural products that have been misassigned and then needed to be reassigned are the most obvious examples, but these represent the tip of an iceberg,’ Goodman says. ‘Much more common are molecules that are misassigned within research projects, the error is caught and the project is delayed as a result. Checking structure determination is a central part of all projects that make molecules; doing it quickly and accurately makes projects progress more steadily and more reliably.’
Goodman’s group has been trying to solve the problem of automated structure validation for more than a decade, but the technology has not been up to the challenge. ‘We did not have the data nor the computer power to make it useful,’ says Goodman. ‘Now, with more data, more computer power, and with Alex Howarth’s insights and hard work, we have a useful procedure: DP5.’
Elizabeth Krenske, whose research at the University of Queensland in Australia involves computational prediction of molecular spectroscopic properties for natural products structure determination, agrees that DP5 is ‘indeed an important advance’, adding that ‘due to its automated workflow, the technique should find immediate applications in high-throughput synthesis, as well as in the bespoke synthesis of complex molecules’. She also praises the program’s user-friendly interface, which aids structure revision by labelling each carbon in the molecule with a probability score and colour coding to show what areas may need to be changed.
The program uses DP4-AI to automatically generate a DFT-predicted spectrum from a chemical structure. NMR-AI, a part of DP4-AI, automatically assigns the shifts from raw experimental spectra or a list of NMR signals. It then uses atomic environment-specific statistical models to generate a probability score for each carbon in the structure. These are combined into an overall molecular probability.
Goodman and Howarth tested the software with 5140 structures and associated experimental spectra from the database NMRShiftDB. They paired each experimental spectrum with the predicted spectra for all of the structures with the same number of carbons in the set. For an extra challenge, they validated the program against a subset of 5330 of the incorrect pairings weighted so the error distribution was the same as for the correct pairings. DP5 was able to identify which pairings were correct and incorrect in both sets, even the highly similar ones that might puzzle a human.
To demonstrate DP5’s use in a real-world scenario, they set it on 13 actual cases of misassigned and revised structures. DP5 identified all but one of the original incorrect structures, giving them close to zero probability. It also gave nine of the correct structures a high probability of being correct.
Goodman and his team are exploring whether the relatively slow DFT calculations DP5 relies on could be replaced by a quicker machine learning model to make the whole process more efficient. They are also investigating if they can apply the DP5 technique to the spectra of other nuclides.
This article is open access
A Howarth and J M Goodman, Chem. Sci., 2022, 13, 3507 (DOI: 10.1039/d1sc04406k)
DP5 is available on Github