An artificial intelligence system trained on almost 40 years of the scientific literature correctly identified 19 out of 20 research papers that have had the greatest scientific impact on biotechnology – and has selected 50 recent papers it predicts will be among the ‘top 5%’ of biotechnology papers in the future.1

Scientists say the system could be used to find ‘hidden gems’ of research overlooked by other methods, and even to guide decisions on funding allocations so that it will be most likely to target promising research.

But it’s sparked outrage among some members of the scientific community, who claim it will entrench existing biases.

‘Our goal is to build tools that help us discover the most interesting, exciting and impactful research – especially research that might be overlooked with existing publication metrics,’ says James Weis, a computer scientist at the Massachusetts Institute of Technology and the lead author of a new study about the system.

The study describes a machine-learning system called Delphi – Dynamic Early-warning by Learning to Predict High Impact – that was ‘trained’ with metrics drawn from more than 1.6 million papers published in 42 biotechnology-related journals between 1982 and 2019.

The system assessed 29 different features of the papers in the journals, which resulted in more than 7.8 million individual machine-learning ‘nodes’ and 201 million relationships.

The features included regular metrics, such as the h-index of an author’s research productivity and the number of citations a research paper generated in the five years since its publication. But they also included things like how an author’s h-index had changed over time, the number and rankings of a paper’s co-authors, and several metrics about the journals themselves.

The researchers then used the system to correctly identify 19 of the 20 ’seminal’ biotechnology papers from 1980 to 2014 in a blinded study, and to select another 50 papers published in 2018 that they predict will be among the top 5% of ‘impactful’ biotechnology research papers in the years to come.

Weis says the important paper that the Delphi system missed involved the foundational development of chromosome conformation capture – methods for analysing the spatial organisation of chromosomes within a cell – in part because a large number of the citations that resulted were in non-biotechnology journals and so were not in their database.

‘We don’t expect to be able to identify all foundational technologies early,’ Weis says. ‘Our hope is primarily to find technologies that have been overlooked by current metrics.’

As with all machine learning systems, due care needs to be taken to reduce systemic biases and to ensure that ‘malicious actors’ cannot manipulate it, he says. But ‘by considering a broad range of features and using only those that hold real signal about future impact, we think that Delphi holds the potential to reduce bias by obviating reliance on simpler metrics’, he says. Weis adds that this will also make Delphi harder to game.

Weis says the Delphi prototype can be easily expanded into other scientific fields, initially by including additional disciplines and academic journals, and potentially other sources of high quality research like the online preprint archive arXiv.

The intent is not to create a replacement for existing methods for judging the importance of research, but to improve them, he says. ‘We view Delphi as an additional tool to be integrated into the researcher’s toolkit – not as a replacement for human-level expertise and intuition.’

The system has already attracted some criticism. Andreas Bender, a chemist at the University of Cambridge, wrote on Twitter that Delphi ‘will only serve to perpetuate existing academic biases’, while Daniel Koch, a molecular biophysicist at King’s College London, tweeted: ‘Unfortunately, once again “impactful” is defined mostly by citation-based metrics, so what’s “optimized” is scientific self-reference.’

Lutz Bornmann, a sociologist of science at the Max Planck Society headquarters in Munich who has studied how research impacts can be measured2 notes that many of the publication features assessed by the Delphi system rely heavily on the quantification of the research citations that result from them. However, ‘the proposed method sounds interesting and led to first promising empirical results’, he says. ’Further extensive empirical tests are necessary to confirm these first results.’