Simon Coles proposes an alternative approach to sharing scientific information. For what it’s worth…

The paradigm for the acquisition and dissemination of knowledge has remained the same for centuries. The researcher reads and digests accounts of others’ work in order to build hypotheses and develop approaches to address problems. In turn, similar accounts are produced, propagating the body of knowledge. Over the last 350 years, we have come to rely upon the written word for these accounts and on physical archival and distribution mechanisms to communicate this research. In short: the publishing industry.

The arrival of the internet and electronic publishing saw significant changes in publishing. But while it enabled an increase in publications and, arguably, a reduction in costs, the articles themselves look much the same as they did 350 years ago. So have we really made use of this digital revolution? The internet gives us a very effective publishing and delivery system and has profound implications for how we can do science in the future. An example is the Dial-a-Molecule Grand Challenge, which aims to reduce the time to perform or refine the synthesis of a compound using an informatics-based approach founded on what is known from the literature. To do this, it is imperative to know all the details of all reactions ever attempted (whatever the outcome). But we cannot hope to achieve such a goal if we persist with our reliance on the conventional scientific paper.

What is valuable?

The conventional publishing model is built upon adding value to the publishing process. The publisher adds value through the peer review ‘seal of approval’ and this self-assessing approach has enabled the publishing process, and thereby the body of literature, to grow to gigantic proportions. Inevitably, this leads to ranking the perceived quality or novelty of articles and further value is then generated through linking related work and tracking citations. 

This model relies on subscriptions to generate its revenue but there is only so much money in the system. As the value and volumes grow, it will arguably become unsustainable. The open access alternative has authors pay to publish, which is certainly cheaper than having each institution pay for an article. But for such a dramatic change to work, it needs to occur in its entirety, simultaneously and this is akin to an entire nation switching to driving on the other side of the road overnight – difficult to achieve in practice.

But the real tradable commodity or currency in the research economy is the data and information that we need to build new knowledge. If we reconsider the system in these terms and ask ‘what is a valuable item?’ and ‘to whom is it valuable?’, there emerges a possibility for a new economy based upon the inherent value of research outputs (in all their variety).

Each reader’s relationship with an article is different, depending upon their desires. Some articles are dissected and analysed in detail but others are only of interest for one or two facts, or specific parts such as the conclusion, reaction scheme or spectra. In many of these cases, the information reported in an article may not be the most appropriate for one’s purposes. For example, to repeat a reaction, a lab notebook is more useful; to understand a simulation, a software log file. Does this make these outputs, currently considered to be ‘secondary’, more valuable in these contexts?

If we then consider all the other outputs of research, there are many more sources of data such as posters, reports, proposals, theses and talks. So often, useful information can be gleaned from such sources that never make it into the published literature. 

For years, individuals have been requesting these details and data from colleagues via letters, and more recently emails, but we need to move on. Such one-to-one, informal communication is inadequate for the needs of a data-based economy. For one, it cannot scale to the levels that will be required for massive exchange of data. Also, such a system demands that data be discoverable – how does one find data when there is no associated article? Further, tomorrow’s informatics-driven world will be swamped with data, and in this environment information has to be published in a machine-processable way. Finally, we must consider validity and quality control – in the case of one-to-one exchange, one only has the assurance of the provider that the data are reliable, but by formally disseminating data, some manner of peer review or rating can be applied. 

In principle, the internet is well suited to our needs – technically, it is possible to link to any object or point within an article and indeed any object on the internet. So we have the means to find (and pay for, or subscribe to) only the snippet of data or information that is valuable to us. Perhaps the publishing of the future will involve ‘brokered micro-transactions’ where subscribers only receive the information that is really valuable to them. And of course, this raises the question of how these transactions would be mediated and by whom? If the body of knowledge is not organised into discrete papers but is a continuum of data publishing components in 140 characters or fewer, would you subscribe?

Simon Coles is a senior lecturer at the University of Southampton and director of the UK National Crystallography Service