Openly sharing information and research is changing chemistry, says Jean-Claude Bradley. Be first, or be forgotten
Almost a decade ago, the term ‘podcasting’ grabbed society’s imagination. Although sharing audio and video files over the internet had been possible for some time, the technology for creating, disseminating and following media reached a critical mass for the average internet user. Predictably, this situation represented different opportunities for different factions of the ideological spectrum. At one end, some saw new means to monetise their skill sets and products. At the other end, another group recognised a means for the radical sharing of knowledge. For the majority, the opportunities lay somewhere in the middle – for example, freely sharing some content with the hope of selling another portion, or freely sharing content but with restrictions on its use.
I watched these factions form in the education community. Many colleagues expressed concerns about freely sharing their intellectual property, such as audio or video recordings of their lectures. My mantra at the time was ‘Be first, or be forgotten’ – once a few good teachers decide to share their lectures without restriction, it is only a matter of time before the internet is saturated with free knowledge for all. When this happens, and it has, the value of any one teacher’s private content is dramatically decreased, irrespective of whether it subsequently becomes shared or not. But people will remember those who were first, and indeed those pioneers have taken the opportunity to uniquely shape the way scientific teaching evolved.
In the past few years the same scenario has been unfolding in chemistry research and chemical information. The social web has evolved into a more semantically aware, machine-readable web in which an ecology of services and tools has flowered. Chemical information can now be communicated to both human and automaton at every point on the spectrum of openness. Obtaining chemical information can still be an expensive – if necessary – component of scientific research, but at the radically open end of this spectrum, the efforts of a few are steadily reversing the situation.
Within a few years, virtually any organic chemical property will be available
As an example, my group and collaborators (Andrew Lang and Antony Williams) have created and curated open collections of solubility and melting point data. We now have over 27,000 melting points, sufficient to create models based on these datasets that can provide usable estimates where experimental values do not exist (a good fraction of the organic chemical space). With no restrictions on these datasets and models, and with various machine-readable interfaces, anyone in the world can obtain the melting point of an organic compound without financial or computational obstacle. I expect that within a few years virtually any organic chemical property will be available, either as a collection of open data measurements or ‘good enough’ estimates.
Sourcing those melting point data was arduous – just as with teachers’ opposition to sharing lectures, most companies and agencies we contacted were not willing to donate their datasets to the public domain. But it was also surprising – there is a tremendous amount of chemical information out there, which has been painstakingly collected, that remains unavailable simply for want of a request. We were able to obtain Alfa Aesar’s entire collection of melting points (well over 10,000) simply by asking for them.
With similar donations, we now have so many redundant measurements for a large enough portion of the collection to confidently curate much of it. And with a reasonably good model in hand, it is no longer critical to contact other sources.
Free for all
Technology also enables openness in research, sharing research as it happens, almost in real time. This philosophy has proven remarkably fault resistant because it requires making public not only the experimental details but also all of the raw data used to draw inferences. Any researcher can step through every detail and make an independent evaluation. Students make mistakes – as do professors – and in the past trusting people might have been a necessary evil. Today, it is a choice. Optimally, trust should have no place in science.
There is also a tremendous amount of useful information in reactions or reaction attempts that is never shared. Regardless of whether or not a reaction is ‘successful’, if its execution is carefully recorded it can provide valuable information. Some excellent tools and standards exist that allow for easy semantic tagging of chemical reactions and properties so that an experiment can be available for discovery as soon as it is started.
Open chemistry will not appeal to everyone. But it does not need unanimous openness; the actions of a few are all that is required to effect its progress. And its benefits are available to all – the spectrum’s whole population, those who share and withhold alike. Indeed, the spectrum of participation is both necessary and useful. Open chemistry is unalterably inclusive.
Our experience with the melting point data was truly a win-win situation for the chemistry community and Alfa Aesar. The provenance information from our collections leads directly to their catalogue – a form of free marketing and advertising. And it is probably also beneficial that their contribution is critical to this story’s telling.
Be first, or be forgotten.
Jean-Claude Bradley is associate professor of chemistry at Drexel University, US