With more article submissions and fraudulent activity than ever before, journal peer review processes are creaking under the pressure. Nina Notman discovers how AI and automated tools are taking some of the strain
-
Paper mill fraud is overwhelming peer review systems, with publishers like Hindawi retracting thousands of fraudulent articles. AI and automated tools are now being deployed to detect and prevent such submissions before they enter the peer review process.
-
Research integrity platforms like STM Integrity Hub and tools from companies such as Clear Skies and Cactus Communications use multiple checks – including network analysis, author credentials, reference validation and detection of AI-generated content – to flag suspicious papers.
-
New AI tools are also supporting editors and reviewers, with systems like Alchemist Review offering manuscript summaries, method critiques and interactive chat functions. Some publishers allow limited AI use to improve clarity in peer review reports, while others strictly prohibit generative AI due to confidentiality and accuracy concerns.
-
Post-publication fraud detection is growing, with initiatives like the Problematic Paper Screener identifying published papers with signs of manipulation. However, experts argue that systemic reform – such as reducing reliance on peer-reviewed journals and embracing preprints – is essential to truly restore integrity in scientific publishing.
This summary was generated by AI and checked by a human editor
In 2023, Hindawi – a branch of journal publishing giant Wiley – retracted over 8000 scientific articles. They had all been written by paper mills: unofficial set-ups that generate plagiarised or fraudulent papers that resemble genuine research, and sell paper authorship. Paper mills have increasingly infiltrated academic journals for over a decade, but the scale of the fraud discovered at Hindawi that year was a wake-up call for the publishing industry.
Today, ‘around one in 50 papers’ have patterns that suggest that they are from paper mills, says Adam Day, chief executive of Clear Skies, a publishing data analytics company based in London, UK. Peer review has been the main quality control process for journals for decades, but the paper mill crisis has overwhelmed this gatekeeping mechanism. ‘The fact that there’s so much dishonesty is because the peer review system is not working as it should,’ Day says.
Publishers are fighting back with increasingly sophisticated automated and artificial intelligence (AI) technology that spots signs that articles were produced by paper mills, and other types of fraud. Many of these tools aim to flag fraudulent papers before they reach referees. ‘The aim is to go more upstream in our detection. We want to make sure that [fraudulent] papers do not enter the peer review process let alone the body of published literature,’ says Joris Van Rossum, programme director for STM Solutions, part of the International Association of Scientific, Technical and Medical Publishers (STM), in the Hague, Netherlands.
The tools don’t make the final decision but rather provide data and information to support editorial staff in preliminary paper assessments. ‘Our job is to bring the relevant signals to the publishing teams … so they make the decision if the submission should be rejected, needs further manual investigation, or if it can be passed along to peer review,’ says Hylke Koers, chief information officer at STM Solutions.
The scale of the problem
Research integrity tools are being developed by publishers, including Frontiers, Wiley, Elsevier and Springer Nature, and commercial entities, such as Clear Skies and Cactus Communications. Each of the tools looks for multiple types of nefarious behaviour to build up a comprehensive picture of articles, because there is no single indicator that can tell a paper is from a paper mill. ‘We look at 25 different elements of papers,’ says Christopher Leonard, director of product solutions at Cactus Communications, based in Mumbai, India. Cactus launched its research integrity checker, Paperpal Preflight for Editorial Desk, in 2023, in the wake of the Hindawi episode.
Also in 2023, STM Solutions started building a research integrity platform that pulls together aspects of various publisher and commercial systems. The STM Integrity Hub currently has 15 different checks. It is embedded in editorial office management software and currently used by over 30 publishers, including the Royal Society of Chemistry and the American Chemical Society. ‘We screen over 125,000 manuscripts per month now and we expect that number to increase significantly by the end of this year,’ Van Rossum says.
Different systems present results in different ways, with some providing a traffic light result for each individual check and others a composite score. Clear Skies’ system, the Papermill Alarm, for example, uses network analysis to get a composite score. A concept borrowed from modern bank fraud detection, network analysis can spot relationships between various individual fraud check results that are beyond the capability of humans. Machine learning is also used to probe the less obvious connections and reduce the chance of false positives, says Day.
How AI detects academic fraud
The fine details of how the research integrity tools work are kept under wraps to avoid paper mills developing means to circumvent them. New functionalities are also regularly added to tackle the paper mills’ evolving tactics. ‘Paper mills are constantly keeping one step ahead of publishers,’ says Shilpi Mehra, head of research integrity and Paperpal Preflight at Cactus Communications.
Checks on author credentials, including publication history, prior retractions and if their affiliations and email domains are real, are common across most tools. So too are checks on references, to ensure they exist, haven’t been retracted, aren’t all very old, match the topic of the paper, and there aren’t excessive numbers of self-citations or citations to a single group. Missing disclosure statements are another red flag some systems look for. ‘We used a Google face recognition algorithm and trained it on articles … to be able to detect if there were faces, tattoos or other things that needed ethical guidelines or consent from patients,’ says Marie Soulière, head of editorial ethics and quality assurance at publishers Frontiers in Lausanne, Switzerland.
In terms of article content itself, plagiarism checks are standard. Some tools also use AI to scan for AI-generated images and text, something that is currently a challenge due to the ease with which these tools can be tricked. ‘If you add a few typos there’s almost no chance that the AI will think that it’s AI generated, because AI doesn’t make typos,’ explains Soulière. Some systems, such as the STM Integrity Hub and Clear Skies’ Papermill Alarm, search for ‘tortured phrases’ as evidence that text has been written by large language models, like ChatGPT and Google Gemini, or is plagiarised text rephrased by machine-paraphrasing tools such as SpinBot. The use of tortured phrases to detect AI derived text is championed by Guillaume Cabanac, co-founder of the Problematic Paper Screener and professor of computer science at University of Toulouse, France. Examples of commonly found tortured phrases include ‘cruel temperature’ instead of ‘mean temperature’, ‘amino corrosive’ instead of ‘amino acid’, and ‘brilliant gadgets’ instead of ‘smart devices’.
Paper mill papers are often submitted simultaneously to multiple journals. But the siloed nature of journal publishing has historically hindered editors from seeing submissions made to journals at other publishers before they are published. The STM Integrity Hub recently launched a duplicate submission check that looks at all papers under consideration across multiple publishers. ‘Publishers using the system estimate that about 50% of duplicate submissions are probably from paper mills,’ says Van Rossum.
Supporting editors with automation
Fraud isn’t the only challenge that automated checks can help editorial offices with. Some of the above research integrity systems have expanded to include simple checks that are time-consuming to do manually. These free up editor and referee time to focus on assessing the quality of the science. Automated checks of this type include checking reference details, language clarity, the labelling of tables and figures, and the presence of disclosure statements for ethics or conflict of interest. Some systems also check if articles adhere to the page limits and formatting style of the journal they are being submitted to.
Cactus Communications is moving these simple checks even further upstream with a web-based tool for authors to run automated article assessments pre-submission. Journals direct authors to the Preflight Paperpal for Authors tool on their submission pages or alongside the online author guidelines. Launched in 2020, Preflight Paperpal for Authors is ‘live on nearly 1000 journals with about 30 different publishers and societies’, says Mehra.
Earlier this year, three physics society publishers – AIP Publishing, Institute of Physics Publishing and the American Physical Society – participated in the prototype development of a next-generation tool to support journal review. The aim was to explore how automation could assist with routine tasks and reduce the time editors and referees spend assessing manuscripts. The prototype has since matured into a working tool called Alchemist Review with additional capabilities and is being further developed by the AI company Hum in partnership with GroundedAI
In addition to a similar suite of checks in other currently available tools, Alchemist Review performs deep textual analysis to generate AI summaries of papers to aid more rapid editor assessments. It also extracts and evaluates methods from papers and provides critique as to whether these methods are appropriate for their aims and assesses whether the content of the references support claims made in the submitted paper. ‘There’s also an AI-powered chat [function] where you can ask Alchemist Review questions about the manuscript,’ says Ann Michael, chief transformation officer of AIP Publishing in New York, US.
‘Initially, editors will use this to help [determine if a manuscript] is something they want to send to review,’ says Michael. If successful, longer term the team are excited to explore where the technology may be able to help in other areas of peer review, she adds.
Should reviewers use chatbots?
Once publishing houses make the leap to providing their referees with access to robust closed-source AI tools to support article assessments, considerable time savings seem likely. Meanwhile, the pressure on the peer review system continues to grow.
Since the launch of ChatGPT in November 2022, some overwhelmed peer reviewers have been seduced into using large language models to write their peer review reports. James Zou, a biomedical data scientist at Stanford University in California, US, and his colleagues assessed 50,000 peer reviews for computer-science conference proceedings published in 2023 and 2024. They found that large language models had written up to 17% of the sentences in the reviews. ‘We noticed that the way that people were writing reviews, and the kinds of words that they were using in those reviews, started to look quite different from 2022 compared to before,’ says Zou. AI signature words, for example, such as ‘commendable’ and ‘meticulous’ started to show up in much higher frequency. These words ‘are much more likely to be used by large language models like ChatGPT than by humans’, Zou explains.
Currently, all publishing houses explicitly state in their referee guidelines that generative AI must not be used by peer reviewers. Confidentiality is one reason. ‘If you use a public tool and upload an unpublished article to it, it’s going to be ingested by the AI, and you’re going to breach confidentiality agreements,’ says Frontier’s Soulière, who is also a Committee on Publication Ethics council member. This confidentiality issue can be avoided by using closed-source AI or only uploading manuscripts that are already pre-prints. But accuracy problems remain. ‘Generative AI tools can lack up-to-date knowledge and may produce biased or incorrect information,’ says a spokesperson for Springer Nature. Generative AI ‘is likely to provide reports with errors and hallucinations of references’, says Soulière, adding that ‘referees are likely to spend more time … trying to fix [an AI generative review] than doing the review in the first time on their own.’
Non-generative AI is, however, already finding niche uses in supporting peer reviewers at some publishers. Wiley, for example, permits referees to use AI to improve the clarity of writing in peer review reports. ‘This use must be transparently declared upon submission of the peer review report to the manuscript’s handling editor,’ explains a Wiley spokesperson.
Zou sees potential for closed-source large language models to improve the quality of referee reports in other ways in the future. His group is developing a Review Feedback Agent that uses large language models to provide automated feedback to reviewers. When the tool detects unclear or unprofessional statements it prompts referees to rephrase them, while detected broad statements prompt a request to make them more specific. A simple statement that work is ‘not novel’, for example, will prompt reviewers to rephrase along the lines of ‘this work is very similar to these previous papers, A, B and C; can the authors distinguish their work’, explains Zou. These nudges to the referees are optional; they don’t have to make suggested changes.
The Review Feedback Agent uses multiple large language models to improve feedback quality and minimise hallucinations. Zou’s team tested the agent on over 20,000 reviews written about articles submitted to the 2025 AI conference International Conference on Learning Representations. They found that 27% of reviewers who received the feedback went on to make changes to their reports. ‘Over 12,000 specific suggestions from the AI were incorporated by the human reviewers,’ says Zou. The agent is now publically available on the open-source software community GitHub.
Post-publication fraud detection
It is inevitable that the use of AI and automated tools will become more and more prevalent in the assessment of papers, both by editors and referees. Most expect that doing so will reduce some of the strain in the system and slow the flow of poor-quality papers into the published literature. But for now an overwhelming amount of nonsense is still ending up in the published literature. Some research integrity sleuths are also turning to AI and automated tools to help them flag fraudulent papers that have been published.
Cabanac is one such sleuth. In 2021, he released the Problematic Paper Screener that trawls the 130 million published scientific papers on the database Dimensions looking for tortured phrases and other fingerprints of fraudulent activity. As of September 2025, more than 7500 tortured phrases had been added to the list used by the Problematic Paper Screener. ‘Surface region’ instead of ‘surface area’ held the top spot, having been spotted in 42,500 published papers at that point. Papers found to have five or more tortured phrases (to avoid false positives) are published on an online spreadsheet. This list is then reviewed by a network of sleuths, who probe the papers manually. ‘We run it as a crowdsourcing initiative,’ says Cabanac. The sleuths then post their findings on the post-publication peer review platform PubPeer if their examination confirms the paper is problematic. In some cases, the publishers are also contacted directly. ‘Science doesn’t auto correct,’ Cabanac explains, it needs ‘people who want to dedicate time and effort to identify problems and then to contact the publishers asking for the retraction’. More than 3000 papers have now been retracted by journals at their request, he adds.
But although AI and automated tools can significantly improve the detection of fraudulent papers before (and after) they are published, some experts believe that their use is analogous to rearranging deck chairs on the Titanic. ‘We need to be more honest about how strained the system is and how it’s not doing what everybody says it’s doing and wants it to do,’ says Ivan Oransky, co-founder of Retraction Watch, a website that tracks retractions of scientific papers and other ethical issues in science publishing. Fixing the capacity problem – by publishing fewer peer reviewed papers – is the only way to regain the integrity of the peer review process, he adds. ‘We need to have fewer peer reviewed papers,’ but this doesn’t mean publishing less, he clarifies, but making more use of pre-print servers instead where peer review isn’t required.
Oransky acknowledges that a major about-face by academia and the global publishing industry is required for this to happen. Academic is currently a ‘publish or perish’ system, where researchers need to publish extensively in peer-reviewed journals to succeed in academic careers. ‘The publishers have [also] created business models that incentivise publishing more and more articles,’ Oransky says. ‘Even [before open access] when it was subscription models, they had to create more journals and sell them to universities and libraries.’ These are systemic problems and fixing them – if indeed possible – will take decades. In the meantime, the increased use of automated tools to editor and reviewer workloads on individual papers is inevitable.
Nina Notman is science writer based in Salisbury, UK
Image backgrounds all © Shutterstock

No comments yet