Is blockchain a new paradigm in safeguarding science, or does it just tie our hands?

In 1957, Berni Alder and Tom Wainwright reported the first ever molecular dynamics calculation, using the relatively new technology at the time of ‘electronic computers’.1 On a lumbering Univac machine, they carried out calculations at a blistering 300 collisions per hour. One could have been forgiven then for being pessimistic about the prospects of modelling with electronic computers. But it was groundbreaking work, and laid the foundations of modern molecular dynamics.

A similar pessimism might surface about a recent study by Alexander Ashmore and Magnus Hanson-Heine. Like Alder and Wainwright before them, Ashmore and Hanson-Heine have achieved a groundbreaking first using a computer that is as new, and as slow, as the Univac was in the 1950s: the Ethereum blockchain.2 Using a grand total of 400 timesteps at 0.1fs increments, the pair modelled the vibration of a carbon monoxide molecule over 40fs. Compared to modern computing systems that can model hundreds of millions of atoms over nanosecond timescales, the numbers in Ashmore and Hanson-Heine’s study look unimpressive to say the least. So why consider it groundbreaking?

A similar pessimism might surface about a recent study by Alexander Ashmore and Magnus Hanson-Heine. Like Alder and Wainwright before them, Ashmore and Hanson-Heine have achieved a groundbreaking first using a computer that is as new, and as slow, as the UNIVAC was in the 1950s: the Ethereum blockchain.2 Using a grand total of 400 timesteps at 0.1fs increments, the pair modelled the vibration of a carbon monoxide molecule over 40fs. Compared to modern computing systems that can model hundreds of millions of atoms over nanosecond timescales (Chemistry World, June 2020, pXX), the numbers in Ashmore and Hanson-Heine’s study look unimpressive to say the least. So why consider it groundbreaking?

Never break the chain

The term ‘blockchain technology’ refers loosely to networks of computers that can collaborate to maintain databases, or perform computations, without requiring a central administrator. The idea was first introduced by Satoshi Nakamoto in his digital currency, Bitcoin. At the time, Nakamoto limited the permitted computations to simple commands deemed useful for financial transactions, but a few years later a Canadian scientist, Vitalik Buterin, developed Ethereum, which imposes no such limits (it is ‘Turing complete’, in the language of computer theory).

The Ethereum blockchain is thus able to function as a large, massively distributed computer, that is owned, managed and operated by: no one in particular. A direct result of this decentralised administration is that no one has the ability to delete or modify data once it has been uploaded to the network – a feature commonly referred to as immutability. Consequently, fears about information censorship, data tampering, or even simply losing data, can be mitigated by using blockchains. That has made it attractive for a wide variety of applications where the data is both very important but also potentially contentious (for example, patents, land titles, medical records, carbon credits and supply chains).

One can imagine a future where the chemistry community demands that all simulations use blockchains

To the computational chemist, blockchain technology presents several advantages. Firstly, it offers unparalleled improvement in reproducibility: not only is the simulation code developed by Ashmore and Hanson-Heine and their resulting data publicly available forever, but the hardware they used to run the code will also be the same forever. As many in the computational chemistry community know, code written decades ago, which ran on hardware that is now obsolete or non-existent, is more reproducible in principle than in practice. Secondly, because computations and data uploaded to blockchains are timestamped, there can be no argument about who performed a simulation first if all parties used blockchains. A closely related benefit is increased trust in modelling’s predictions of experimental phenomena; robust timestamping makes it clear whether the experimental result or the ‘prediction’ came first. Lastly, when scientific data becomes a focal point in contentious political issues such as climate change, blockchains can act as a safe haven for information that might be censored or erased elsewhere.

To the computational chemist, blockchain technology presents several advantages. Firstly, it offers unparalleled improvement in reproducibility: not only is the simulation code developed by Ashmore and Hanson-Heine and their resulting data publicly available forever, but the hardware they used to run the code will also be the same forever. As many in the computational chemistry community know, code written decades ago, which ran on hardware that is now obsolete or non-existent, is more reproducible in principle than in practice (see p10). Secondly, because computations and data uploaded to blockchains are timestamped, there can be no argument about who performed a simulation first if all parties used blockchains. A closely related benefit is increased trust in modelling’s predictions of experimental phenomena; robust timestamping makes it clear whether the experimental result or the ‘prediction’ came first. Lastly, when scientific data becomes a focal point in contentious political issues such as climate change, blockchains can act as a safe haven for information that might be censored or erased elsewhere.

Chain rule?

Yet we are a long way from routine blockchain computation in chemistry, because these benefits come with some significant costs. For one, the Ethereum virtual computer is many orders of magnitude slower than a conventional computer (and the comparison is even more stark when compared to conventional distributed computing systems). And although improvements in blockchain technology may significantly increase its speed, it will always be slower than traditional computers. This is because the distributed computers in a blockchain must each replicate every computation, rather than sharing the workload among them as conventional distributed networks do. This redundancy is critical to the reliability and robustness of blockchains, but also a fundamental constraint on their efficiency. 

There is also the question of whether such radical transparency is even desired – consider that simulation code containing mistakes and typos cannot be modified once uploaded to a blockchain. Our academic culture would need to become significantly more forgiving of human error before the community would jump into a paradigm where our elementary mistakes are on display for eternity.

Yet we are a long way from routine blockchain computation in chemistry, because these benefits come with some significant costs. For one, the Ethereum virtual computer is many orders of magnitude slower than a conventional computer. And although improvements in blockchain technology may significantly increase its speed, it will always be slower than traditional computers, because rather than sharing the workload, the distributed computers in Ethereum and other blockchains must each replicate every computation. This massively redundant replication is critical to the reliability and robustness of blockchains, but also a fundamental constraint on their efficiency. There is also the question of whether such radical transparency is even desired – consider that simulation code containing mistakes and typos cannot be modified once uploaded to a blockchain. Our academic culture would need to become significantly more forgiving of human error before the community would jump into a paradigm where our elementary mistakes are on display for eternity.

Blockchains might also be a needlessly complex solution to our problems. Legacy code and reproducibility issues, for example, are addressed by a multitude of other solutions. These days, software engineers go to extensive lengths to ensure code is reproducible even on disparate hardware, and the advent of cloud infrastructure – where diverse computational hardware can be instantly summoned on demand – has accelerated this trend. Also, software ‘containerisation’ approaches can package up code so that applications run reliably in any computing environment, just as shipping containers abstract what is being shipped from how it gets shipped. 

Ashmore and Hanson-Heine were motivated by the ‘replication crisis’ in science and cite a meta-analysis that indicates ‘roughly 2% of scientists surveyed had admitted to fabricating, falsifying or modifying data or results at least once’.3 By making retroactive tampering infeasible and improving reproducibility in other ways, blockchains promise to help with at least some aspects of this crisis. Ultimately, however, any solution to the replication crisis will require a culture shift.

Still, one can imagine a future where, perhaps after reeling from one-too-many scandals, the chemistry community and the journals we publish in demand that all data and simulations use blockchains, just as today we upload new compounds to databases. Perhaps Ashmore and Hanson-Heine have started what may eventually become a new standard for computational science.

Chris Wilmer is an associate professor and Daniel Salmon is a postgraduate student at the University of Pittsburgh, US