Major publishers sue Meta over alleged use of books and articles to train Llama AI

Elsevier, Cengage, Hachette, Macmillan, McGraw Hill and author Scott Turow file class-action complaint in Manhattan federal court alleging large-scale copyright infringement

By Ajmal Hussain META

Major publishers sue Meta over alleged use of books and articles to train Llama AI

Key Points

Five major publishers and author Scott Turow sued Meta in Manhattan federal court alleging unauthorized use of books and journal articles to train the Llama AI model.
The complaint alleges millions of works were copied without permission, including specific novels cited by the plaintiffs, and seeks class certification and unspecified monetary damages.
The case adds to broader litigation over whether using copyrighted material to train AI qualifies as fair use; prior related cases have produced conflicting judicial rulings and at least one large settlement.

Five leading publishers - Elsevier, Cengage, Hachette, Macmillan and McGraw Hill - together with author Scott Turow filed a proposed class-action lawsuit in Manhattan federal court on Tuesday, accusing Meta Platforms of using their copyrighted books and journal articles without authorization to train its Llama artificial intelligence model.

The complaint alleges Meta copied and processed millions of works - ranging from textbooks and scientific papers to novels for the purpose of training large language models that respond to human prompts. The plaintiffs contend those works were used en masse without permission, a practice they describe as pirating material to develop AI capabilities.

According to the filing, examples of specific works allegedly included in the dataset used to train the model are "The Fifth Season" by N.K. Jemisin and "The Wild Robot" by Peter Brown. The publishers have asked the court to permit them to represent a broader class of copyright owners and are seeking monetary damages, though the complaint does not specify an exact amount.

Meta did not immediately respond to a request for comment on Tuesday.

Maria Pallante, president of the Association of American Publishers, is quoted in the complaint materials criticizing the conduct it describes: "Meta’s mass-scale infringement isn’t public progress, and AI will never be properly realized if tech companies prioritize pirate sites over scholarship and imagination."

Legal context and wider disputes

The lawsuit marks a fresh escalation in an ongoing legal dispute between creative owners and technology firms over the use of copyrighted material to train AI systems. Dozens of different plaintiffs - including authors, news organizations and visual artists - have brought suits against multiple AI companies, alleging similar patterns of unauthorized ingestion of copyrighted content.

Central to these cases is whether the use of copyrighted works to train AI qualifies as fair use, with courts asked to determine whether such training is sufficiently transformative to avoid infringement. Prior rulings in related litigation have diverged: two judges addressing the question issued conflicting decisions in the prior year, underscoring the unsettled legal landscape.

The complaint also notes that Anthropic, a company backed by Amazon and Google, resolved one of the earlier class actions by agreeing to a settlement payment of $1.5 billion to a group of authors.

What the complaint seeks and immediate implications

The publishers are pursuing permission to act on behalf of a larger class of copyright holders and unspecified monetary relief. The filing signals that established creators and rights holders are pushing aggressively to assert control over how their works are used in AI development.

How courts ultimately interpret fair use in the context of AI training datasets will determine whether this suit and similar cases result in monetary awards, licensing regimes, or altered practices in how AI developers assemble training data. For now, the litigation opens another front in a legal debate that remains unresolved.

Risks

Uncertainty over judicial interpretation of fair use for AI training could produce unpredictable legal outcomes for technology and publishing sectors - impacting AI development and content licensing.
Potential for large monetary damages or settlements could raise costs for AI developers and influence investment and operational choices in the tech sector.
Ongoing litigation could disrupt relationships between content owners and AI companies, affecting business models across publishing, education, and software sectors that rely on trained language models.

Menu

Major publishers sue Meta over alleged use of books and articles to train Llama AI

Key Points

Risks

More from Stock Markets