‘Science is broken’ is such a common cry these days that you might wonder why anyone is still doing it, or why they are paid any heed. The truth is, of course, that most of science is far from broken, but is instead doing remarkable things like detecting gravitational waves, improving cancer therapies or taking snapshots of single molecules reacting.

Breaking the connection - conceptual illustration

Source: © iStock

There are certainly good grounds for the somewhat sceptical response to a recent repeat of this warning by social scientist Daniel Sarewitz.[1] ‘Science isn’t self-correcting, it’s self-destructing’ ran the headline above a now-familiar litany of allegations: retractions and misconduct are increasing while even more published claims collapse or evaporate on close scrutiny. Sarewitz quotes The Lancet editor-in-chief Richard Horton: ‘The case against science is straightforward: much of the scientific literature, perhaps half, may simply be untrue.’

That figure is debated, not least because it tends to refer to the biomedical and psychology literature where small sample sizes, tiny effects and vague hypotheses are a constant bugbear. But at the root of Sarewitz’s charges is an accusation that, perhaps in less extreme and confrontational terms, is reiterated by many scientists: ‘Part of the problem surely has to do with the pathologies of the science system itself. Academic science, especially, has become an onanistic enterprise worthy of [Jonathan] Swift or [Franz] Kafka. As a university scientist you are expected to produce a continual stream of startling and newsworthy findings.’

The problem is not so much the pressure – although that takes its toll, particularly on young scientists – but the incentives. ‘The professional incentives for academic scientists … are perverse and crazy’, writes Sarewitz, ‘and promotion and tenure decisions focus above all on how many research dollars you bring in, how many articles you get published, and how often those articles are cited.’ And, he might have added, where they are published.

Rank measures

These incentives exist for departments and institutions too. The obsessive ranking of universities is one of the most insidious. There was great celebration in China when four of its universities featured in the top 50 of the Times Higher Education’s world rankings. That Oxford University and the California Institute of Technology swapped places in the first and second slots won’t cause much consternation beyond Pasadena, but it’s hard to dismiss such games as absurd when they fall on the shoulders of academics.

If scientists, funders, research agencies and tenure committees were more familiar with the vast economic literature on incentives, they would be more concerned. In short, their relationship with performance is complicated, and poorly chosen incentives can distort behaviour in unproductive ways. Political scientist Paul Smaldino and anthropologist Richard McElreath of the University of California at Davis make this explicit in a preprint,[2] in which they use a simple model to show that rewards for ‘novel’ findings can promote the spread of bad scientific technique through a kind of natural selection, even when there is no conscious malpractice. ‘Whenever quantitative metrics are used as proxies to evaluate and reward scientists,’ they write, ‘those metrics become open to exploitation if it is easier to do so [such as by conducting shoddy research that brings publications] than to directly improve the quality of research.’

This is simply an expression of Goodhart’s Law, familiar to economists, which engineers Marc Edwards and Siddhartha Roy of Virginia Tech have expressed succinctly in a recent paper: ‘when a measure becomes a target, it ceases to be a good measure.’[3] If we took that idea seriously, we would need to ditch h-indices (for which the literature on gaming the system is huge), university and departmental rankings, and journal impact factors.

Purging the problem

Edwards and Roy provide one of the most sober and sobering indictments of inappropriate incentivisation. Their tabulation of ‘growing perverse incentives in academia’ is awfully – and indeed painfully – plausible in its description of intended and actual effects. Rewarding researchers for their number of publications doesn’t improve productivity but lowers standards; rewarding them for citations leads to self-citation and that familiar comment from referees: ‘the pioneering work of Baggins et al. should certainly be cited.’ Edwards and Roy tellingly observe that there is probably an optimum for scientific productivity in the balance between research quality and quantity – but if so, that optimum evolved under a system without the perverse incentives of today.

What’s to be done? Edwards and Roy have some good suggestions, including a proper assessment of the scope of the problem, input from experts on incentivisation and its pitfalls, and guidelines on best practice. But I suspect that what’s really needed is open revolt. What if those top 50 universities, which have little to lose, refused outright to play the game – to ‘purge the conversation’ of impact factors (as the American Society of Microbiology is laudably aiming to do) and h-indices, and ignore or boycott national and international grading exercises like the UK’s Research Excellence Framework process?

The effectiveness of research obviously needs to be monitored somehow, and objective methods can help to avoid the gender and racial prejudice that is still rife in research appointments. But when assessment and evaluation turn into incentives, everyone seems to suffer.