Analytical chemist Heather Desaire and her team at the University of Kansas (KU) have created a detector they claim is 98–100% effective at identifying chemistry papers generated by large language models (LLMs) like ChatGPT.1 The researchers argue that their tool can help scientific publishers detect and prevent improper use of artificial intelligence (AI) in academic journals.

The researchers first unveiled their detector in June, when they applied it to Perspectives articles from Science and found that it recognised ChatGPT-generated scientific text with over 99% accuracy.2 But now they have dramatically expanded the tool’s scope by testing it on chemistry papers.

The KU detector was trained on 100 introductory passages from 10 journals published by the American Chemical Society. The team then tasked ChatGPT with writing similar passages.

The machine learning model correctly identified human-authored passages 100% of the time, as well as those generated from prompts based on only report titles. The results were almost as good when the LLM was trained on introductory passages, with correct identification 98% of the time.

‘The big motivation was to look more broadly at a selection of journals … and we wanted to challenge the approach with more complex and diverse prompts,’ Desaire explains.

The detector was then given a tougher test. It was put up against samples that weren’t used in training, but were identical in nature to the training data – in this case 150 introductions from three other chemistry journals not in the original training set. A later release of ChatGPT was also used to improve the AI-generated text. The AI detector was still able to correctly classify the new text 92%–98% of the time for the three journals.

Research fraud experts point out that the ChatGPT finder could be used as a tool to flag suspect papers, which would then be investigated by a journal’s reviewers or editors to determine if it is fake.

Elisabeth Bik, a microbiologist and scientific integrity consultant in the US, is enthusiastic about the study. ‘This is a welcome new tool that might greatly help editors of scientific journals to screen incoming manuscripts for computer-generated texts, similar to the use of … plagiarism-detection software,’ she tells Chemistry World.

But Saniat (John) Sohrawardi, a fifth year PhD student at the Rochester Institute of Technology in New York who works on ‘deep fakes’ detection, has some reservations. ‘No journal, no academic venue should use the tool as the only justification to reject any paper,’ he states. ‘I do believe that their work has merit as a first pass, provided it’s efficient enough and low resource enough, but there has to be disclaimer saying that this is not to be used as definitive proof to reject the paper.’

Concerns about overhyped claims

Several experts in this area are sceptical about the claims of any AI-detector reaching such high-reliability levels, however. Debby Cotton, director of academic practice at Plymouth Marjon University in the UK, points out that early experimentation with many of these detectors has suggested high accuracy but this has rarely been borne out once more widely tested.

Cotton, who authored a recent study that examined the use of AI in higher education3, says that in the case of this latest tool the model was specifically trained on a narrow field and that will make it better than most. Nevertheless, she suggests that it is usually quite easy to evade such detectors with some superficial human editing, pointing out that there is even a new service called Undetectable AI that helps authors who want to disguise the AI-origins of their work.

Reuben Shipway, a marine biology lecturer at the University of Plymouth who is a co-author on Cotton’s paper, concurs. ‘What’s stopping authors from simply writing using LLMs, screening the output against the detection software and then modifying the output until it scores low on the detection software?’ he asks. ‘At the moment, nothing.’

Desaire now wants to determine the extent to which ChatGPT has infiltrated the research enterprise. ‘We have a detector that is useful for looking for ChatGPT contributions to academic science writing, and so the next step would be to then apply it to academic science writing and see,’ she says.

‘I don’t think anybody really knows how much ChatGPT is contributing to the academic literature – is it zero, is it 20%?’ Desaire asks. She says the goal of her future study will be to elucidate how common unacceptable use of ChatGPT is in the scientific literature.