Elsevier v. Meta: AI Training Lawsuit Explained

Edwin_Denby_with_megaphone_at_ball_field_LCCN2016891774.tif — Edwin Denby with megaphone at ball field by Harris & Ewing

Today in the Southern District of New York, Elsevier, Cengage, Hachette, Macmillan, and McGraw Hill — joined by novelist Scott Turow and his company S.C.R.I.B.E., Inc. — filed a class action against Meta and Mark Zuckerberg over the use of copyrighted works to train Meta’s Llama models. The complaint has six counts: reproduction by torrenting, reproduction from web-scraped datasets, reproduction during training, distribution by torrenting, contributory infringement against Zuckerberg personally, and removal of copyright management information under DMCA §1202(b).

Much of the underlying conduct alleged in the complaint is already being litigated in Kadrey v. Meta, where individual authors have raised overlapping claims about Meta’s training practices. What makes this new filing distinctive is not really the conduct it targets but the content it is focused on and the class it proposes.

Who and what is swept into the case

The complaint defines the proposed class as all legal or beneficial owners of registered copyrights, in whole or in part, for any book with an ISBN or any journal article with a DOI or ISSN that Meta reproduced through torrenting, scraping, or training. The class is limited to works registered with the Copyright Office either within five years of publication (and before Meta’s use) or within three months of publication.

Importantly, the class covers authors and publishers who own copyright of academic journal articles, in addition to trade books and educational textbooks which are what most previous suits have included. The sample works listed in the complaint’s Exhibit A include five Elsevier journal articles spanning oncology, urology, and the philosophy of mind. One of those articles — a paper titled Monoclonality of Multifocal Epithelioid Hemangioendothelioma of the Liver by Analysis of WWTR1-CAMTA1 Breakpoints, by authors at Memorial Sloan-Kettering in New York — was NIH funded and is freely available to the public on PubMed Central.

On my reading, the class definition would cover academic authors who have published a journal article during the class period and retained their rights. This would include authors who retained partial rights in their work and authors who hold copyright jointly with their publishers.

The class also extends to a much larger body of academic work where authors have formally signed copyright over to publishers — the standard arrangement at most commercial journals, including Elsevier’s. Many of those authors retain specific rights, such as the right to deposit a postprint in a university institutional repository under a Green Open Access policy. But the core exclusive rights at issue in this lawsuit — the right to reproduce copies under § 106(1) and the right to control CMI under DMCA §1202(b) — sit with the publisher, which makes the publisher — not the author — the class member. An author who specifically pushed to preserve public access to their own work would therefore have no standing to opt out of litigation pursued in the publisher’s name, and no direct say in any settlement that results.

Finally, the class definition contains an important wrinkle: it covers only those works that Meta used “without authorization.” Many fully and hybrid open access journals release articles under Creative Commons licenses, most commonly CC BY, which grant broad permissions to copy, distribute, and prepare derivative works on stated conditions, including attribution and, in some variants, limits on commercial use. Whether those licenses authorize what Meta is alleged to have done — copying entire articles into a training corpus to build and commercialize Llama, while stripping copyright management information in the process — is a question that has not been squarely resolved.

If sorting class members from non-class-members requires the court to decide which CC-licensed works fall inside the class definition and which fall outside, this case could end up being a vehicle for resolving whether and how common open licenses cover AI training. That is a question with implications well beyond Meta and Llama. It also places OA publishers, and the many authors who chose CC licensing precisely because they wanted their work used and reused, in the awkward position of having their license terms construed in a class action they may not even know they are part of.

What happens next

Class actions require class representatives to fairly and adequately represent the class. I think many academic authors in this class will not be enthusiastic about Elsevier representing their interests in court (I’m definitely one of them), and the class representatives may have a hard time showing that they can do so in light of a history of an antagonistic relationship with many members of this class. This would include, to give just one example, the large number of academic authors who have publicly boycotted Elsevier in response to its business practices.

But the case isn’t so straightforward — I think many are also wary about Meta as a company, particularly as this case may lead toward a settlement that would allow Meta to train LlaMa on licensed Elsevier content. Academic authors and researchers stand to gain little from such settlement or licensing deals, and may stand to lose much if the fair use defense is eroded for web scraping or computational analysis in the course of this litigation.

For now, this is a putative class action, and a lot has to happen before it goes anywhere. Meta will respond, motions will be filed, and at some point the court will decide whether to certify the class. If certification happens, notice will go out to class members, and individual rightsholders may have an opportunity to opt out.

Authors Alliance will be following the case closely and will share more as it develops. For now, the most useful thing scholarly authors can do is recognize that many of their works will be implicated in this suit and take steps to stay informed as it progresses.

Discover more from Authors Alliance

Subscribe to get the latest posts sent to your email.

Authors should learn to read licensing agreements and distinguish “fake” open access from full open access offered by full open access publishers using true CC BY licenses. Fun fact: if you publish something under CC BY-NC, you might think that nobody – including Elsevier – can make money from your work, right? That’s what “NC” means, right? Wrong.

The Elsevier license agreement is structured to give Elsevier exclusive rights to commercially monetize authors’ work, while the author receives nothing. As disclosed at : “Authors publishing [with Elsevier] under the CC BY-NC-ND or CC BY-NC licenses agree not to license any third party to reuse their articles or any part of their articles for commercial purposes. Elsevier has the exclusive right to license third parties to do this.”

I don’t think authors and librarians pay enough attention to the fact that open access comes in very different flavors. More than ever, they should support full open access publishers that do not double- or triple-dip through APCs, subscription fees for hybrid journals, and AI licensing deals.

I hope the courts will find that the license agreement authors signed with Elsevier contradicts the spirit of the CC BY-NC license. But at the end of the day, it is also the responsibility of authors to understand what they are signing — and whether they are unintentionally signing away rights they thought they were retaining, including commercial reuse rights.

Surprise: Elsevier is Suing Meta For You?

Who and what is swept into the case

What happens next

Discover more from Authors Alliance

Leave a Comment

2 thoughts on “Surprise: Elsevier is Suing Meta For You? ”

Who and what is swept into the case

What happens next

Share this:

Discover more from Authors Alliance

Related Posts

Leave a Comment

2 thoughts on “Surprise: Elsevier is Suing Meta For You? ”