Replication, in theory

No comments

Experiments are seldom replicated by different research teams, says Philip Ball. Why is this and does it really matter?

What’s wrong with this claim? ‘Replication of results is a crucial part of the scientific method. Experimental errors come rapidly to light when researchers prove unable to reproduce the claims of others. In this way, science has a built-in mechanism for self-correction.’

The insistence on replication - as the motto of the Royal Society puts it, ‘take no one’s word for it’ (Nullius in verba ) - has indeed long been one of science’s great strengths. It explains why pathological science such as cold fusion and polywater was rather quickly consigned to the dustbin, while equally striking claims such as high-temperature superconductivity have entered the textbooks.

But too often this view of the ‘scientific method’ - itself a slippery concept - is regarded as a regular aspect of science in action, rather than an expression of the ideal. Rather few experiments are replicated verbatim, as it were, not least because science is too competitive and busy to spend one’s time doing what someone has already done. Important claims are bound to get checked as others rush to follow up on the work, but mundane stuff will probably never be tested - it will simply sink unheeded into the literature.

No one should be surprised or unduly alarmed at that - if work isn’t important enough to warrant replication, it matters little if it is flawed. And although the difficulty of publishing negative results probably hinders the correction process and favours exaggerated claims, information technologies might now offer solutions.¹ What matters more is that replication isn’t just a problem in practice; it’s a problem in theory.

The concept emerged along with experimental science itself in the late sixteenth century. Before that, experiments - when they were done at all - were typically considered not a test of your hypothesis but a demonstration that it was right. Even though the early experimentalists decided they needed to filter recipes and reports by attempting to verify them before recording them as fact, the tradition of experiment-as-demonstration persisted for a long time. Many of the celebrated trials shown to the Fellows of the Royal Society were like that.

OPINION-BALL-410

But it would be wrong to suppose that the failure of an experiment to verify a hypothesis or to replicate a prior claim should be grounds for their rejection. Robert Boyle appreciated this in his ‘Two essays, concerning the unsuccessfulness of experiments’ (1661). There are many reasons, he wrote, why an experiment might not work as anticipated: the equipment might be faulty, or the reagents not fresh, for example. That was amply borne out (albeit in reverse) by the recent discovery that a crucial step (first reported in 1918) in the alleged total synthesis of quinine by Robert Woodward and William Doering in 1944 depended on a catalyst being aged.² The very fact that it took 90 years to test that step is itself a comment on how replication really functions in science.

The problem of replication was highlighted by Boyle’s own famous experiments with the air pump. By raising the possibility of a vacuum, these studies posed a serious challenge to the prevailing Aristotelian philosophy. So the stakes were very high. But because of the imperfections of the apparatus, it was no easy matter even for Boyle to reproduce some of his findings. And because the air pump was a hugely sophisticated piece of scientific kit - it has been dubbed the cyclotron of its age - it was very expensive, so very few others were in a position to try the experiments. Even if they did, the designs differed, so one couldn’t be sure that the same procedures were being followed.³ That essentially no replications could be attempted without first-hand experience of Boyle’s instrument reflects today’s situation, in which hardly any complicated experiment can be replicated reliably without direct contact between the labs involved. Even then, the only way to calibrate your apparatus may be against that whose results you’re trying to test.

Which raises the question: if your attempted replication ‘fails’, where is the error? Have you neglected something? Or was the original claim wrong? Or was it right for the wrong reasons? The possibilities are endless.

Science makes progress regardless, and what is perhaps surprising is that the ‘scientific method’ remains so effective when it is in truth ramshackle, makeshift and logically shaky.

These issues seem more pertinent than ever. Who, for example, is going to check the findings from the Large Hadron Collider?