New analysis discovers Sci-Hub holds 69% of the world’s 82 million scholarly articles

More than 92% of chemistry journal papers – most of which are paywalled – are available for free from pirate site Sci-Hub, an analysis has revealed.

Sci-Hub – established by former neuroscientist Alexandra Elbakyan in September 2011 – provides illegal access to 62 million scholarly papers and books, bypassing paywalls. Elbakyan, who operates the site out of Russia, could not be reached for comment but Sci-Hub did tweet responses to some of the study’s claims on 2 August.

The study, posted before peer review on PeerJ Preprints, identifies chemistry as the most heavily pirated discipline on Sci-Hub, although no field’s coverage is less than 75%. And another study just published has found that less than 20% of chemistry literature is free-to-read.

Sci-Hub provides free access for humans, but what truly open literature would provide is free access for computers

Daniel Himmelstein, University of Pennsylvania

On 19 March, Sci-Hub – most popular among researchers in China, India and Iran – released a list of digital object identifiers (DOIs) of all the papers it could successfully find. The authors interrogated this database to discover that Sci-Hub contains nearly 69% of all 81.6 million articles registered in DOI registry Crossref. This number rises to over 85% when taking into account only paywalled content.

‘It’s hard for me to see a successful model that includes subscriptions in an era when content is available in this manner,’ adds Casey Greene, a computational biologist at the University of Pennsylvania and one of the study’s co-authors.

Lawsuits

Sci-Hub’s activities haven’t gone unnoticed. In June, a US court granted publishing giant Elsevier $15 million (£12 million) in damages from Sci-Hub, the Library of Genesis and related sites. Later that month, the American Chemical Society (ACS) filed its own lawsuit against Sci-Hub alleging copyright infringement, trademark counterfeiting and trademark infringement.

The latest analysis found that Sci-Hub’s database contains 1.4 million papers from 63 ACS journals, which is 98.8% of the society’s scholarly content. Only the American Physical Society has had a higher percentage of its journal content pirated at 99.9%, the study reveals. The Royal Society of Chemistry (the publisher of Chemistry World) had 94% of its papers pirated. And Elsevier holds the record for the highest number of papers on the site at over 13 million papers from 3356 journals.

Glenn Ruskin, director of external affairs and communications at the ACS, tells Chemistry World: ‘Sci-Hub has amassed the volume of materials it reports to have through theft and deception and in violation of the rule of law.’

A 2016 analysis of six months of Sci-Hub’s server log data by Science showed that the site received 28 million download requests between September 2015 and February 2016. This data was also used in another study, which showed that 12 out of 20 journals whose papers are most downloaded from Sci-Hub are chemistry journals.

Confidential donations

From June 2015 to June 2017, the study claims that Sci-Hub received at least 30 donations per month. During this period, the authors identified 1037 donations totalling 93 bitcoins, equivalent to just over $60,000 at the time. Since Sci-Hub hasn’t spent all its donations, it is claimed that the unspent bitcoins are now worth around $175,000.

But on 2 August, Sci-Hub tweeted that ‘the information on donations in this study is not very accurate, but I cannot correct it: that is confidential’. Daniel Himmelstein, another of the study’s co-authors who is a data scientist at the University of Pennsylvania, notes that this may mean the analysis underestimates donations. ‘Unless Sci-Hub doesn’t control the bitcoin addresses, there is no way we could overestimate donations,’ he says. Himmelstein also pointed out that donations could be occurring to bitcoin addresses the authors are not aware of or using alternative payment methods.

One limitation of the study is that Crossref DOIs should not be used to represent the entire scholarly literature since DOIs are poorly represented in some fields – such as the arts and humanities – and were first introduced in 2000, says Cassidy Sugimoto, an information scientist at Indiana University Bloomington, US. ‘DOIs tend to underrepresent national and non-English language journals,’ she says, noting that even by the authors’ own estimation, Crossref has only registered 67% of all DOIs. Nevertheless, Sugimoto says that ‘until there is evidence that higher coverage in Sci-Hub relates to lower subscriptions, then we can only assume that Sci-Hub has provided a parallel, but non-disruptive, form of access’.

Recent months have also seen new legal alternatives to Sci-Hub pop up. Browser extension Unpaywall, for example, trawls the web to find free-to-read versions of paywalled papers, which are posted online by authors themselves.

But Himmelstein says none of these initiatives – including Sci-Hub – address the issue that much of the literature is not free to text mine, as it would be under a creative commons licence. He adds that Sci-Hub’s activities are likely to force the movement towards publishing models that allow text and data mining. However, for now, he notes, ‘what Sci-Hub provides is free access for humans but what truly open literature would provide is free access for computers’.