When Tamás Kriváchy was in the process of getting one of his papers published, he realised that one of the references he had included was wrong. After a closer look, Kriváchy, a physicist at the Barcelona Institute of Science and Technology, realised he had sent the journal the wrong reference. This was especially concerning as he was citing one of his own papers, which he had already referenced in multiple papers before.
Eventually, he managed to trace the issue back to where he believes the reference first went awry: in the bibliography of a journal article published by Springer Nature.
After close inspection, Kriváchy noticed that when he downloaded metadata from the Springer Nature website for any of its online-only journals, the publisher’s system automatically put the article number in what’s supposed to be the page number field, and automatically assigned it as the first issue. But, it turns out, the root of the issue may lie elsewhere.
Article number confusion
Online-only journals, of course, don’t have page numbers, Kriváchy notes. Whereas article numbers refer to the order in which papers are published within the volume of a journal, for example, article number 1 is the first article published in a given volume. ‘This issue one is actually accidentally interpreted as an article number,’ he says. Kriváchy thinks that this means that the first articles of every volume are likely to be getting an erroneously high number of citations.
So he decided to conduct an analysis to find out if that’s truly the case. His analysis, published as a unpeer-reviewed preprint on arXiv, examined the citation patterns of three online-only Springer Nature journals: Nature Communications, Scientific Reports and BMC Public Health.
After comparing the first articles of the last 25 volumes collectively published by these three journals with papers of a similar age published in those same journals, Kriváchy says he found that they are ‘consistently massively outperforming their peers in terms of citation count and being referenced’.
According to his analysis, every single first article in all 25 volumes published by the three journals collectively attract more citations than the average citation count of manuscripts published by the same journals around the same time.
Out of the top-ranking articles in Scientific Reports and Nature Communications, five out of the top 10 most highly cited papers were the first articles within a volume, the study found. ‘Article ones are just being referenced way too often compared to what they should be,’ Kriváchy says.
Citation counts artificially boosted
Due to the citation errors, papers that cite article ones are often about completely unrelated topics and disciplines, Kriváchy notes. ‘I’m just surprised they haven’t noticed up to now,’ he adds.
The problem could have several ramifications, Kriváchy warns. For instance, he notes that scientists will have difficulty finding relevant scientific literature – as well as papers that cite their own work – if references in bibliographies are broken.
It’s also a problem that some articles are having their citation counts – and metrics that rely on them – artificially boosted because of a technical error, Kriváchy says, giving some academics an unfair edge when applying for grant applications or senior faculty positions.
Kriváchy suspects the issue could affect all Springer Nature journals that use article numbers, instead of page numbers. When he compared his findings with journals from some other publishers, Kriváchy didn’t detect the same problem.
Even if the issue is fixed entirely, ‘we’ll still see article number one citation counts outperforming the other ones for quite some time because of this aggregation and just being popular articles’, Kriváchy notes.
‘We take any claims regarding incorrect data seriously – especially when they concern information important to our authors,’ says a spokesperson for Springer Nature, adding that the publisher is looking into the matter. ‘Looking at the conclusions we suspect they could be misleading due to incomplete data.’
Nees Jan van Eck, a senior researcher at the Centre for Science and Technology Studies (CWTS) at Leiden University in the Netherlands, who performed follow-up analyses on Springer Nature data, says citation statistics for some of the publisher’s journal papers are ‘significantly distorted’.
But Jan van Eck says his analysis also found incorrect citation attribution at other publishers. ‘I have seen similar issues for journals of other publishers, including cases where articles with article number 1 or page number 1 receive too many citations,’ he adds. ‘However, the scale is much smaller because these journals publish far fewer articles per issue, resulting in a smaller number of citations attributed to an incorrect article.’
Jan van Eck thinks the fault lies primarily with the reference matching algorithm of Crossref, which mints and registers digital object identifiers for scholarly content. ‘Since Crossref’s reference linking service is widely used by publishers and Crossref metadata is used by many downstream services (eg Dimensions, OpenAlex, OpenCitations, OpenAIRE), inaccuracies in Crossref metadata may propagate widely,’ he adds.
Dominika Tkaczyk, Crossref’s director of technology in Dublin, Ireland, agrees that there are around 150,000 incorrect citation links, around 25% of which is from publisher-asserted data and the rest was generated by Crossref’s reference matching process. ‘Automated metadata matching is inherently challenging, given the complexity and noise in the scholarly metadata, and such processes will never achieve 100% accuracy,’ she says.
To solve the problem, Crossref plans to update its algorithm to create a more consolidated reference matching service, Tkaczyk says, as well as fix any existing erroneous links. ‘While no system will be flawless, the new service is being designed to make it faster and simpler to address errors and systematic biases when they arise.’
References
Tamás Kriváchy, arXiv, 2025, DOI: 10.48550/arXiv.2511.01675
No comments yet