Changing authorship patterns mean that the h-index is no longer an effective way to gauge a scientist’s impact, according to a new study by data scientists at technology giant Intel.

First created in 2005 by the US-based physicist Jorge Hirsch, the h-index is a measure of a researcher’s most highly cited papers. A scientist with an h-index of 30 has published 30 papers that have each been cited more than 30 times.

Due to its relative simplicity, the h-index has become a widely used tool to quantify scientists’ impact in their fields. But its use has always been controversial. ‘Since its introduction, it has been highly criticised by professional bibliometricians,’ says Lutz Bornmann, an expert on research evaluation based at the Max Planck Society in Munich, Germany.

Critics of the h-index point out that it unfairly penalises early-career researchers, who have had less time than their older colleagues to publish papers and build up citations. The metric also fails to account for differing publishing rates across academic fields and can even encourage bad publishing practices, such as excessive self-citation and inclusion of authors on papers that contributed little to it. The h-index also completely ignores important aspects of academic life beyond publishing – for example leadership roles, teaching or outreach. ‘Nevertheless, it has become a popular indicator especially among amateur bibliometricians,’ says Bornmann.

Investigating h

Despite these issues, the h-index still features on popular scholarly databases and in some cases can influence important decisions on recruitment and funding that affect researchers’ careers. Vladlen Koltun, chief scientist at Intel’s intelligent systems lab explains that he and his colleagues noticed inconsistencies when browsing researchers’ h-indices across various fields.

‘We set out to probe the h-index, and we asked whether it is really the best metric we can come up with – because it is being used, whether we like it or not,’ says Koltun. ‘It is being used for educational purposes the way we were using it, but also, perhaps more importantly, it’s being used by various committees that evaluate scientists for awards, for promotions and so forth.’

Koltun and his colleague David Hafner used computational tools to analyse citation data from millions of articles across four different scientific fields. ‘We collected data with temporal annotations, so we can trace the evolution of a researcher’s h-index over time – we know what the researcher’s h-index was in 2010, 2019, 1998,’ says Koltun. ‘And we did this on the scale of thousands of researchers.’

They then cross-referenced the data against lists of winners of various scientific prizes and inductees to national academies, which Koltun reasons serves as evidence of scientists’ reputation within their community.

‘So we can examine correlation in real time – does the h-index correlate with a reputation at present?’ explains Koltun. ‘But even more interestingly to me, we could ask questions such as, “Does the h-index predict reputation in the future?” Because that’s actually how it’s being used … the most consequential use of these metrics is for making decisions such as whom should we hire?’

Predictive power palls

According to Koltun’s analysis, when the h-index was first created it was reasonably good indicator of who might win future awards. But this ‘predictive power’ started to wane over the years. ‘To the point that now the correlation between rankings induced by the h-index in physics, for example, and rankings induced by awards and recognition by that academic community – the correlation is zero, there is just no correlation,’ says Koltun.

One reason for this is the increasing number of large scientific collaborations, Koltun explains. He points out that hyper-authorship – a growing phenomenon where global research consortia produce papers with thousands of co-authors – enables people to rack up enormous h-indices very quickly.

‘What our data also shows is that the hyper-authors are simply an extreme manifestation of a broader shift in authorship patterns and publication patterns. Generally, people are publishing more, people are co-authoring more, author lists are growing,’ says Koltun. ‘And if you don’t take that into account, what you get is an inflation in the metrics and inflation in the h-indices across the board.’

Koltun and Hafner propose a new metric, the ‘h-frac’, to solve this issue. The h-frac, allocates a proportion of citations to each author, depending on the number of co-authors on a paper. ‘It’s more reliable than the h-index … Even when we go back to 2005 when the h-index was introduced, h-frac was already more reliable, but the gap has widened dramatically because the reliability of the h-index fell off a cliff.’

The h-index and h-frac both seek to determine which researchers have made greatest cumulative contribution to their field over their lifetime. But the Intel team are also keen to see whether similar measures can offer insight into which groups are currently carrying out the most innovative work, or who consistently produce ground-breaking results. In their latest study, currently available before peer review as a pre-print, Koltun and Hafner suggest another metric to address this, the Cap, which assesses how impactful a researcher’s work is relative to their publishing volume.

Since 2005, more than 50 alternative measures to the h-index have been proposed without any receiving practical significance, says Bornmann, who is unconvinced that any new variants will become important indicators. He points out that the Web of Science database recently adopted beamplots – a data visualisation tool that Bornmann’s team helped to develop, which illustrates a researcher’s publication history over time. Clarivate, who maintain Web of Science, hope that such tools will ‘steer us away from reduction to a single-point metric and force us to consider why the citation performance is the way it is’.

Koltun and Hafner acknowledge the calls to abandon simplified citation-based metrics and agree that ideal scenarios would involve in-depth assessment of researchers’ work. But with the use of such measures ‘as widespread as ever’, they argue that there is a need for better metrics. They hope that their findings ‘can inform the science of science and support further quantitative analysis of research, publication, and scientific accomplishment’.