Impressive technological tools are pointless without data transparency

Synthetic organic, medicinal and natural product chemists are rejoicing the arrival of a new automated NMR data analysis tool. The program is called DP4-AI and is set to save researchers time when interpreting their NMR data. Elucidating a chemical structure is the rate-limiting step for many studies. By spending less time on structural assignments, chemists can focus their energy on other bottlenecks. Moreover, the tool is open source so is sure to be used far and wide.

The development also serves as a reminder that operator bias and human error can affect structure determinations. In a number of cases, the team behind DP4-AI say the program uncovered structural misassignments in the data they used to test their tool. Incorrect structures litter the chemical literature and DP4-AI can help to address this long-standing problem by preventing researchers’ preconceived ideas from prejudicing a characterisation.

Then there’s the separate, but associated debate over how and where NMR data is stored. Many researchers only publish pictures of their NMR spectra or in the worst cases just lists of chemical shifts; the raw NMR data is rarely accessible. Various independent NMR databases have surfaced in recent years, including the NMR Raw Data Initiative.1 And while establishing a data standard would be ideal (and there are ongoing efforts), we can at least put all the existing raw data in one place. The structural biology community deposit atomic coordinates into the Protein Data Bank prior to publishing structural studies, which shows what can be achieved when a community supports standardisation. Just saying.

Perhaps chemistry journals should lead on this decision. Providing raw NMR data that preserves all chemical information would enhance the traceability and reproducibility of studies probing molecular connectivity and 3D spatial relationships.

Contemporary NMR spectroscopy is one of the most powerful analytical tools in science. Let’s strengthen that title further with exemplar data transparency. Otherwise, what’s the point in improving NMR from a technological perspective if its integrity and reproducibility can be all too easily cast into doubt?