US-based AI company Anthropic has reached a first-of-its-kind settlement with authors of books that it used to train its models without obtaining prior permission.
Under the settlement, announced at the end of September, Anthropic must pay the authors of books – including academic ones – $1.5 billion (£1.1 billion) after downloading their works from the pirate websites LibGen and Pirate Library Mirror (PiLiMi) to train its AI models.
Although Anthropic, which produced the Claude family of large language models (LLMs), reportedly downloaded around 7 million books to train its AI models, authors of only around half a million books will be compensated.
That’s because their titles meet the criteria of being downloaded by LibGen or PiLiMi, having valid ISSN or ASIN numbers, and having been registered with the US Copyright Office within three months of publication or before they were downloaded by Anthropic.
Rightsholders are likely to be eligible for around $3000 per title, and can search their books on the eligible works list and file a claim.
Unclaimed funds will be divided up and sent out again to those authors who have made claims in proportion to what they already received, says Mary Rasenberger, chief executive officer of the New York-based Authors Guild, an advocacy group for writers that helped lawyers working on the case to develop a claim form. ‘But if the leftover money is too small to make a distribution worthwhile, then it may be given to a recipient selected by plaintiff’s counsel to aid authors broadly (typically a not-for-profit),’ Rasenberger says.
The Authors Guild is a plaintiff in a copyright infringement lawsuit against OpenAI, the firm that created ChatGPT, in the Southern District of New York. Several other lawsuits against AI companies are underway, alleging copyright infringement.
‘It seems pretty likely that Anthropic was not alone in using pirated material to train its model,’ adds Dylan Ruediger, principal for the research enterprise at Ithaka S+R, a New York-based nonprofit that last year launched a tool that tracks deals between scholarly publishers and AI firms that allow LLMs to train on academic papers and data.
While many scholarly publishers have already signed deals with AI companies, organisations representing researchers are pushing for academics and other authors to be paid when LLMs use their work. Still, Ruediger says, legal clarity on LLMs remains a long way off.
Although the Anthropic lawsuit is focused on books, it’s likely that other pirated academic content, including research papers, are also being used to train AI models, Ruediger says.
‘The size of the settlement itself, and of the damages that Anthropic was seeking to avoid, are really good indicators of a mutual interest between publishers – academic and otherwise – and the commercial LLMs in creating some kind of licensing structure for this material,’ Ruediger adds.
As part of the present settlement, Anthropic has agreed to destroy all copies of pirated books in its possession, Rasenberger explains. ‘The settlement does not address future conduct because it is about past infringement but it will push Anthropic to acquire new books legally,’ Rasenberger notes. ‘It would be too easy to bring another class action suit if they were to [fail to] do so again in the future.’
Chemistry World reached out to Anthropic for a comment.
No comments yet