A tool using natural language processing and machine learning algorithms is being rolled-out on journals to automatically flag reproducibility, transparency and authorship problems in scientific papers.

The tool, Ripeta, has existed since 2017 and has already been run on millions of journal papers following its release, but now the tool’s creators have enabled its latest versions to be run on papers before peer review. In August, Ripeta was integrated with the widely used manuscript submission system Editorial Manager in a bid to identify shortcomings in papers before they are sent out to peer review at journals. At this stage the tool’s creators won’t disclose which journals are using Ripeta, citing commercial confidentiality.

Ripeta sifts through papers to identify ‘trust markers’ for papers such as whether they contain data and code availability statements, open access statements, as well as ethical approvals, author contributions, repository notices and funding declarations.

From October 2022, the technology behind Ripeta was also integrated in the scholarly database Dimensions, giving users access to metadata about trust markers – for a fee – in 33 million academic papers published since 2010.

An upcoming white paper reporting trends based on the 33 million Dimensions records reveals that the proportion of academic papers containing funding statements has risen steadily from just over 30% in 2011 to just under 50% in 2021. Over the same period, competing interest statements have also increased sharply to just under 40% – an increase of just over 30%. Meanwhile information about ethical approvals and authors’ contribution statements has shot up from around 5% of scholarly papers in 2011 to more than 25% in 2021. Although the number of papers containing data availability statements has gone from close to zero in 2011 to more than 20% in 2021, specific code availability sections are yet to see common adoption, emerging only in the last three years.

‘It would be like having an app within a smartphone platform,’ says Leslie McIntosh, chief executive officer and founder of US-based Ripeta. ‘The hope is that people would use this and improve the manuscript before they get published.’

If Ripeta prompts researchers to fix issues such as code and data availability statements or ethical approval statements, that would free up time for editors and peer reviewers to focus on the actual science, McIntosh says. ‘Just because they have all the pieces [it doesn’t] mean that they actually have a well stated hypothesis and their methods are good.’

Some academic publishers have rolled out their own internal AI systems to flag potential conflicts of interest, authorship issues, or other breaches of research integrity.

McIntosh says her customers include research institutions, funding agencies, policymakers and individual researchers. ‘Checking for nefarious things is hot’ at the moment, McIntosh says. ‘The way that we’re checking for that and being able to leverage dimensions and [identify] potential nefarious networks is actually very unique.’

Michèle Nuijten, a meta-science researcher at Tilburg University in the Netherlands who helped create the algorithm statcheck, which flags statistical errors in scientific studies, says it’s a great idea to spot shortcomings in papers before publication. ‘I do hope that these kinds of tools are here to stay because we need some help in dealing with the enormous amount of output.’

One downside of AI tools is that they’re not completely transparent and it’s often unclear how they work. McIntosh agrees that all software is biased due to the data they are trained on and the implicit biases of people who create the tools. To minimise biases, she argues that there always needs to be manual data validation and curation, with humans always in the loop with the findings and making the final decisions.