Predictions based on numerical measures of research quality were ‘wildly innacurate’, say mathematicians

Shutterstock

The researchers said tossing a coin would produce more accurate predictions

A team of mathematicians who used metrics to predict the outcomes of the UK’s national assessment of research in 2014 have reported that their results were ‘wildly inaccurate’. The Research Excellence Framework (REF) relies on peer review, and some have claimed using metrics, such as citations, would be simpler and cheaper. But university managers would get more accurate predictions by tossing a coin, the researchers claim.

Ralph Kenna and colleagues at the Applied Mathematics Research Centre at Coventry University, UK, used a measure called the departmental Hirsch index, or departmental h-index, to predict REF 2014 outcomes. This attempts to measure the productivity and citation impact of a department. They looked at groups in biology (31), chemistry (29), physics (32) and sociology (25), but their predictions did not come close to either overall REF outcomes, or how institutions moved in the rankings relative to 2008’s exercise.

‘We found that the h-index is quite useless as a tool for predicting REF results,’ says Kenna. ‘Our recommendation is to forget about metrics as a proxy to REF.’

The team also investigated whether the h-index could be used as a ‘navigator’ to help managers gauge whether their university will go up or down in the rankings. Kenna says: ‘We found that they are useless there too. In fact sometimes they are worse than useless – correlation coefficients are negative in the case of chemistry and sociology [a score close to one is useful].’

The team makes a number of caveats, noting that the list of submitters to REF 2014 is different to that of 2008. It was also not possible to obtain the citation data, and therefore calculate h-indices, for some institutions. This reduction in dataset size can affect correlation coef?cients, they note.

Their conclusion is that quantitative metrics could be useful if used ‘in the correct manner’ by informed subject experts. But over-reliance on a single metric by persons who are not experts could be ‘misleading’.

While the paper shows very clearly that departmental h-index is not a good predictor of REF score it does not say if it is a better measure of the research quality of a department than REF, notes Tom Welton, dean of the faculty of natural sciences at Imperial College London, UK. ‘It would perhaps have been remarkable had there been strong correlations. The two are measuring very different things,’ he says. ‘The departmental h-index takes into account all of the outputs from a department and tells you how much the people actually working in the same area used these to inform their own work. The REF analysis tells you what a group of identified experts thought of a selected subset of these outputs. The latter tells you about individual acts of excellence, whereas the former tells you about strength in depth.’ Welton adds that both should be equally valued in any quality-based research funding mechanism.