Canadian researchers find that the chances of a data set being of any use to science falls by 17% a year

Science Photo Library

Research data are rapidly being lost to science as time passes, Canadian researchers have confirmed. As individual researchers are not preserving their data for posterity, there is a pressing need for tougher rules on data-sharing in public archives, the team concludes.

Governments, funding agencies and journals are already introducing policies to ensure that research data are available on public archives. They are increasingly concerned that authors are often unable or unwilling to share their data, making them poor stewards of their research, particularly in the long-term.

In a systematic analysis of data availability over time, the Canadian team, led by Timothy Vines from the University of British Columbia, confirmed that the older the article, the harder it was to recover the data. They report that broken emails and obsolete storage devices were the main obstacles to data sharing.

To avoid data storage media and different research community practices confusing the results, the team focused on recovering data from one specific area: articles containing morphological data from plants or animals that made use of a particular analysis. This consisted of 516 articles published between 1991 and 2011.

The team found at least one apparently working email for 74% of papers, either in the article itself or by searching online. After requesting the research, they received 101 data sets (19%) and were told that another 20 (4%) were still in use and could not be shared. So, in total, 23% of data sets were still usable or extant. 

For papers where the authors gave the status of their data, the odds of a data set being extant fell by 17% a year since publication. What’s more, the odds that the team could ?nd a working email address for the ?rst, last or corresponding author on a paper fell by 7% a year.

‘I’m surprised the numbers are not higher,’ says Peter Murray-Rust of the department of chemistry at the University of Cambridge, UK. He estimates ‘data decay’ at around 50% a year. And this is worse in chemistry than in life sciences, he adds. ‘Chemists hate sharing data,’ he says. ‘For example, in computational chemistry and materials science, essentially no primary data is published.’ An important step to help this problem is a radical re-thinking of how graduate students manage their data, he suggests.