Jun 26, 2025 11:30:00

A judge dismisses a lawsuit alleging that Meta used books to train AI, but says that this doesn't mean that Meta's use of copyrighted material is legal

Meta, the developer of Facebook and Instagram and the developer of its own AI, Meta AI, has been sued by several authors for 'using copyrighted books to train its AI,' but Judge Vince Chhabria, who is in charge of the trial, has dismissed the lawsuit.

ORDER DENYING THE PLAINTIFFS' MOTION FOR PARTIAL SUMMARY JUDGMENT AND GRANTING META'S CROSS-MOTION FOR PARTIAL SUMMARY JUDGMENT
(PDF file)

https://storage.courtlistener.com/recap/gov.uscourts.cand.415175/gov.uscourts.cand.415175.598.0.pdf

Meta Beats Authors' Copyright Suit Over AI Training on Books
https://news.bloomberglaw.com/legal-ops-and-tech/meta-beats-copyright-suit-from-authors-over-ai-training-on-books

Judge dismisses authors' copyright lawsuit against Meta over AI training | AP News
https://apnews.com/article/meta-ai-copyright-lawsuit-sarah-silverman-e77968015b94fbbf38234e3178ede578

A group of 13 authors, including comedian Sarah Silverman , Jacqueline Woodson , author of ' The Kindness of One Person ,' and Ta-Nehisi Coates, author of ' Between the World and Me, ' have filed a copyright infringement lawsuit against Meta, alleging that the company is stealing their works to train its AI.

OpenAI and Meta sued by three authors for copyright infringement - GIGAZINE

The lawsuit alleges that Meta CEO Zuckerberg had authorized the Llama development team to use data sets containing copyrighted books and documents to train Llama, but Meta argued that downloading copyrighted data was not a copyright violation .

Meta CEO Mark Zuckerberg is being pursued in a lawsuit for allowing the AI 'Llama' development team to use copyrighted works without permission - GIGAZINE

Judge Chhabria of the San Francisco Federal Court, the judge in charge of this case, dismissed the plaintiffs' lawsuit on June 25, 2025 local time. Judge Chhabria determined that the plaintiffs had made false allegations, but the ruling was limited to the plaintiffs, who argued that Meta's AI (Llama) could reproduce parts of the plaintiffs' books and that Meta's unauthorized use of the works for training reduced authors' ability to license their works as training data for large-scale language models.

The plaintiffs allege that Meta 'caused massive copyright infringement' by downloading authors' books from pirated online repositories and using them to train Meta's generative AI, Llama. 'Meta could and should have paid to purchase and license these literary works,' the plaintiffs' lawyers said.

Meta argues that US copyright law allows unauthorized copying of copyrighted material and transforming it into something new, and that its use of data constitutes fair use. It also argues that the text Llama generates is fundamentally different from the text in the books used for training. It also claims that Llama will not output text that entirely copies the contents of the books used for training, even if requested by the user.

In this case, Judge Chhabria concluded that neither of the plaintiff's two arguments held water, stating that 'Llama is not capable of generating sufficient text for the plaintiff's books, and the plaintiff has no right to claim a market in the licensing of works as AI training data in the first place.' Judge Chhabria dismissed the case, stating that 'the court cannot make a decision based on general understanding, but must make a decision based on the evidence presented by the parties.'

Judge Chhabria noted that 'this ruling does not support Meta's argument that its use of copyrighted material to train its language models is lawful,' and that 'this ruling merely supports Meta's argument that the plaintiffs have made false claims and failed to produce a record to support their true claims.' In fact, Judge Chhabria said, 'Plaintiffs have barely mentioned or presented any evidence that Meta has copied plaintiffs' work to create a product that is similar to the work of plaintiffs and has created a market dilution product, which is potentially prevailing.'

In addition, Meta also dismissed the argument that 'requiring AI companies to comply with decades-old copyright laws would slow important technological advances at a crucial time.' 'AI is expected to generate billions, or even trillions, of dollars in revenue for developers, so if, as companies say, they need to train their AI with copyrighted material, they should find a way to compensate copyright holders,' he said.

The plaintiffs' lawyers issued a statement saying, 'The Court found that AI companies that incorporate copyrighted works into their models without obtaining permission from or paying the copyright holders are generally in violation of the law. However, the Court ruled in Meta's favor despite Meta's historically unprecedented record of infringing use of copyrighted works. We respectfully dissent from this conclusion.'

Meanwhile, Meta said, 'Open source AI models are transformatively increasing innovation, productivity and creativity for individuals and businesses, and fair use of copyrighted material is a critical legal framework for building this transformative technology.'

In the lawsuit, the plaintiffs allege that Meta 'caused massive copyright infringement' by downloading authors' books from pirated online repositories and using them to train Meta's generative AI, Llama. 'Meta could and should have paid to purchase and license these literary works,' the plaintiffs' lawyers argued.

Meta argues that US copyright law allows unauthorized copying of copyrighted material and transforming it into something new, and that its use of data constitutes fair use. It also argues that the text Llama generates is fundamentally different from the text in the books used for training. It also claims that Llama will not output text that entirely copies the contents of the books used for training, even if requested by the user.

Anthropic, the developer of the AI chatbot Claude, which is similar to Meta, was sued by three American authors for copyright infringement, but the company won a court ruling that 'training an AI with legally purchased books, even without the author's permission, is fair use and does not infringe copyright.'

A ruling is made that 'AI companies do not need the author's permission to train AI with legally acquired books' - GIGAZINE

Jun 26, 2025 11:30:00 in Software, Posted by logu_ii