Elsevier joins publishers' class-action suit against Meta over Llama training data

Scientific publisher Elsevier has joined a coalition of trade and academic publishers alleging Meta used pirated copyrighted books and research papers to train its Llama large language models.

Scientific publisher Elsevier has joined a putative class-action lawsuit against Meta Platforms and its chief executive Mark Zuckerberg, alleging that the company used copyrighted research articles and books without authorization to train its Llama family of large language models (LLMs). <cite index="7-3,7-4">The case, captioned Elsevier Inc. et al. v. Meta Platforms, Inc. and Mark Zuckerberg, was filed in the U.S. District Court for the Southern District of New York</cite> earlier this month, with Elsevier's participation drawing fresh attention this week as the first major scientific publisher to enter the litigation.

The plaintiffs and claims

<cite index="4-9">The named plaintiffs are publishers Cengage Learning, Elsevier, Hachette Book Group, Macmillan Publishers, and McGraw Hill, along with best-selling author Scott Turow, who filed the putative class action against Meta and Zuckerberg for willful infringement of millions of textual works, including literature, educational works, and scholarly articles, to develop Meta's Llama large language models.</cite> <cite index="1-7">"This case is the first AI action brought by major publishing houses, who have their own story to tell about Meta's flagrant violation of their rights," the Association of American Publishers said in a statement</cite> accompanying the filing.

The complaint focuses on two alleged sourcing methods. <cite index="1-12">The lawsuit alleges that Meta used the Common Crawl data set — a sample of billions of web pages made by trawling the internet — which the plaintiffs say is likely to have included unauthorized copies of copyrighted works, such as scientific abstracts and paywalled papers.</cite> <cite index="1-13">The publishers also allege that Meta downloaded and torrented works from sites including LibGen, a database of books, research papers and textbooks, and Sci-Hub, a repository that gives free access to millions of research articles and books regardless of copyright.</cite>

The complaint further alleges that Meta stripped copyright management information (CMI) from the works. <cite index="2-10,2-11">The plaintiffs also allege that Meta stripped copyright notices and author names from datasets to conceal its use of stolen materials. The complaint states that Mark Zuckerberg personally authorized the piracy strategy after being told that licensing books would undermine Meta's fair use legal defense.</cite> <cite index="1-15">Much of the evidence relies on emails between Meta employees that were revealed during a separate case in which several book authors sued Meta last year (Kadrey v. Meta).</cite>

Relief sought and Meta's response

<cite index="2-17">Elsevier requests the maximum amounts in statutory damages allowed by the Copyright Act and the Digital Millennium Copyright Act (DMCA), a full disclosure of all copyrighted works used to train Llama models, an order requiring Meta to destroy all infringing copies of copyrighted works, and a permanent injunction to stop ongoing infringement and CMI removal.</cite> <cite index="2-13">Elsevier also alleges that Llama has generated summaries of scholarly articles riddled with hallucinations and errors, which could potentially damage authors' professional credibility.</cite>

<cite index="1-10">A Meta spokesperson has said the company would "fight this lawsuit aggressively."</cite>

Legal context

The suit lands amid an unsettled body of case law on generative AI training. <cite index="2-5,2-6">In 2025, the U.S. District Court for the Northern District of California granted a summary judgment in favor of Meta in a similar case, ruling that the training of the Llama model constituted fair use based on the evidence presented. In Kadrey v. Meta, 13 plaintiffs, mostly fiction writers, argued that Meta obtained their works from pirate sites and made unauthorized copies while training the Llama model.</cite> <cite index="1-9">Some cases have been settled but, overall, they have yet to establish a clear precedent on whether it is legal to use copyrighted works to train an LLM.</cite> The Elsevier-led action is the first to bring scholarly publishing rights holders into the dispute, raising the prospect of expanded discovery into Meta's use of academic corpora.

Elsevier joins publishers' class-action suit against Meta over Llama training data

The plaintiffs and claims

Relief sought and Meta's response

Legal context

Cross-references

Sources