AI-generated images could make it almost impossible to detect fake papers

No comments

In mid-March, a one minute video of Ukraine’s president Volodymyr Zelenskiy appeared first on social media and later on a Ukrainian news website. In it, Zelenskiy told Ukrainian soldiers to lay down their arms and surrender to Russian troops. But the video turned out to be a deepfake, a piece of synthetic media created by machine learning.

Some scientists are now concerned that similar technology could be used to commit research fraud by creating fake images of spectra or biological specimen.

‘I’ve been worried very much about these types of technologies,’ says microbiologist and science integrity expert Elisabeth Bik. ‘I think this is already happening – creating deepfake images and publishing [them].’ She suspects that the images in the over 600 completely fabricated studies that she helped uncover, which likely came from the same paper mill, may have been AI-generated.

Unlike manually manipulated images, AI-generated ones could be almost impossible to detect by eye. In a non-peer-reviewed case study, a team led by computer scientist Rongshan Yu from Xiamen University in China created a series of deepfake western blot and cancer images. Two out of three biomedical specialists were unable to distinguish them from the real thing.

Examples of generated esophageal cancer images

These esophageal cancer images are deepfakes that were created by a generative adversarial network

The problem is that deepfake images are unique, says Yu. They show none of the traces people usually look for – duplicated elements and background inconsistencies, for example. Moreover, ‘deepfake and other tools are now highly available’, says Yu. ‘It isn’t rocket science, you don’t need a top expert in AI to use them.’

Deepfakes are often based on generative adversarial networks (Gan), where a generator and a discriminator try to outcompete each other. ‘One network tries to generate a fake image from white noise, let’s say a face,’ explains deepfake technology researcher John (Saniat) Sohrawardi from the Rochester Institute of Technology, US. ‘It doesn’t know how to generate a face initially, so it takes the help of a discriminator, which is another network that learns how to tell apart whether an image is real or fake.’ Eventually, the generator will fool the discriminator into thinking its images are real.

Given that Gans can synthesise faces that are indistinguishable from real ones, ‘I don’t think it should come as a shock that it can generate these types of fairly mundane biological images’, says Hany Farid, who specialises in digital forensics and misinformation at the University of California, Berkeley, in the US. But while deepfakes are a threat to be taken seriously, ‘I’m far more concerned about reproducibility, p-hacking, Photoshop manipulation – the old school stuff, which is still going to, I suspect, dominate for quite a while.’

Matthew Wright, director of the Rochester’s Global Cybersecurity Institute, agrees. ‘I just don’t find this to be particularly threatening, even though it’s technically quite possible and probably difficult to detect if someone did it.’

The digital artefacts left behind by machine learning could be used to identify fake images, explains Farid, though fraudsters usually find a way around such methods after only a few months. ‘In the end, the only real solution are the active ones, authenticate with hard encryption at the point of recording,’ Farid says. He believes that science’s self-correcting mechanisms will eventually dispose of fake research.

Yu says it’s unclear whether the literature already contains AI-generated images. ‘I think we have reached the point where we can no longer tell if the paper is real or fake,’ says Bik. ‘We need to work harder with institutions to have them … take part of the responsibility,’ she suggests, and take away pressure from researchers whose entire career might hinge on publishing in an international journal.