Bartz v. Anthropic: Judge Alsup Certifies Class for Rightsholders of 7 Million Books Used by Anthropic

Phillip Burton Federal Building, San Francisco, which houses one of the courthouses of the Northern District of California, by Ken Lund, CC-BY-SA

Late last week Judge Alsup, presiding over the Bartz v. Anthropic copyright AI litigation, granted a motion to certify a class representing authors and rightsholders of nearly 7 million books. If you are a book author (or a publisher, or an heir to an author), you should be paying attention because there is a good chance that you could be included in this class.

The decision is a major turning point in the litigation — it means that millions of authors and thousands of publishers, literary estates, and other rightsholders are now represented by a small group of three trade book authors and two of their marketing firms.

It also means that the likelihood of Anthropic facing tens of billions in damages is very real, as Professor Ed Lee explains the possible range of damages at play. This, as we’ve previously explained, may tip the scales toward Anthropic settlement (something Judge Alsup has encouraged). That, as we’ve noted, carries its own risks for authors.

We’re disappointed by the decision — it’s flawed in a number of ways and if left in place would create an incredibly complicated, messy class action claims process that would run roughshod over the rights of many authors. Judge Alsup has been very thoughtful and measured in this litigation so this ruling comes as something of a surprise: it misunderstands some fundamental aspects of the book publishing industry, how publishing contracts divide rights, and how difficult it really is to identify rightsholders, while glossing over some of the most challenging issues related to determining copyright ownership in the first place. Such issues have vexed authors, librarians, and the publishing industry for decades.

The newly certified class covering 7 million books, give or take

In Anthropic’s partial win on the fair use issue several weeks ago, you might recall that Judge Alsup found that Anthropics use of books for AI training was fair use, but the court declined to find fair use for books Anthropic downloaded from Books3, LibGen and PiLiMi to create a “central library” for future Anthropic use. For that, the court indicated the parties would need to go to trial.

Given that decision, the class that Judge Alsup certified is focused on only a subset of the class we described previously. This is the class that he recently certified (Books3 materials were not included in the class for a lack of metadata):

All beneficial or legal copyright owners of the exclusive right to reproduce copies of any book in the versions of LibGen or PiLiMi downloaded by Anthropic. “Book” refers to any work possessing an ISBN or ASIN which was registered with the United States Copyright Office within five years of the work’s publication and which was registered with the United States Copyright Office before being downloaded by Anthropic, or within three months of publication. Excluded are the directors, officers and employees of Anthropic, personnel of federal agencies, and district court personnel.

Representing this class, Judge Alsup approved the three plaintiff authors and their two “loan-out” companies (companies that the authors wholly own as LLCs or the like, mostly for tax purposes) to represent all members of the class. The human authors are:

Andrea Bartz, author of thrillers like The Lost Night, The Herd, We Were Never Here, and The Spare Room, published by Penguin Random House;
Charles Graeber, nonfiction author of The Good Nurse and The Breakthrough, published by Hachette; and
Kirk Wallace Johnson, author of nonfiction works including To Be A Friend Is Fatal, The Feather Thief, and The Fishermen and the Dragon.

As the decision explains, the books used by Anthropic from LibGen and PiLiMi are thought to amount to some 7 million titles. How many of those have copyright registration we do not know.

It’s worth noting that this is not the class the plaintiffs actually proposed, and neither Anthropic nor the plaintiffs had the chance to adequately brief the court on the implications of this definition. Judge Alsup created the class himself after the parties submitted their briefs. For the class that was briefed, Anthropic raised numerous objections regarding the adequacy of the class representatives as well as the suitability of the suit for class-wide resolution. We think all of those concerns still hold true or are actually worsened by this now-approved class definition.

Do we even know what books we are talking about?

The court spends just a few pages attempting to address what has been one of the most vexing questions for mass digitization efforts for the last three decades—how to determine copyright ownership of large numbers of books.

To begin, the court addresses the problem of figuring out just what books are in LibGen and PiMiLi. We know both have accompanying metadata files that purport to explain the contents, but they have errors. The Judge made much of Anthropic’s efforts to identify works accurately internally for its own research using accompanying metadata: “Anthropic’s own expert stated in his declaration that a list of errors found in LibGen over the span of several years has lengthened by about one post for every 400 books.” Across 7 million books, that would mean approximately 17,500 books that are entirely misidentified, which is obviously not a great place to start when trying to find their copyright owners. There are, of course, other problems as well, such as botched downloads and file mismatches that could introduce other errors.

The court proposes some fanciful means of addressing this deficiency — leaning heavily on an expert report that suggests one can use a file hash to accurately identifying works: “there is the common hashing method that plaintiffs’ Expert Ben Zhao proposed to spot and solve problems with the metadata — and to stand as its own, common method for identifying and indeed proving books were duplicated in their entirety or at least in generous verbatim chunks. . . . The specific implementation would compute multiple hashes across a class claimant’s work and compare the hashes both individually and in bunches against those from Anthropic’s LibGen book files.”

This all sounds really great until you realize that it would mean that either all class claimants would need to produce an electronic copy of their book or the lead plaintiffs would somehow acquire 7 million legitimate digital ebooks to generate hashes of the originals. There are only a small number of libraries in the entire US that have, over a period of decades at an expense of many millions of dollars, been able to acquire so many books.

No realistic plan for identifying rightsholders

But, assuming away the problem of actually identifying the right books, the court essentially ignores decades of efforts aimed at the problem of actually identifying who owns the rights to all of these works. Worse still, many of the works are likely to be “orphan works” — i.e., works for which the owner cannot be identified or located.

Given that copyright ownership information is not public and there is no canonical registry of ownership, the difficulty of tracking down rightsholders for downstream use has prompted serious study of the issue. Congress proposed several bills to deal with the orphan works problem, the Copyright Office conducted multiple studies and roundtables, producing several reports, the EU has passed its own directive addressing orphan works, along with accompanying infrastructure to communicate rights. In the Google Books case (the closest analogy of a mass digital corpus of this size of books), Google and the settling plaintiffs concluded that it would cost Google $34.5 million to set up a “Books Rights Registry” to identify owners for payouts and licensing under the proposed settlement in that case. Even in cases where works are not true “orphans,” it can take tremendous amounts of effort to track down and effectively communicate with putative rights holders about copyright issues regarding their work. This is especially true for academic publishing, where norms on copyright ownership are more variable when compared with the types of works authored by class representatives.

Judge Alsup’s solution? A really, really good letter writing campaign. He explains: “A rigorous notice process will be required. Notice will be by first-class mail and email to the author, publisher, and copyright owner listed on the copyright certificate for each work recorded. Notice will also be sent by first-class mail and email to all trade and university publishers in the United States. Notice will also be published at least once in a trade journal. And, as a step in the claims process, notice also will be served by the class claimant himself on all others he attests may claim a copyright interest in the work to notify them that he is claiming it — including all those with whom he has contracted concerning the copyright and/or publishing.”

Judge Alsup gives no hints as to what will be done with those authors who are dead and whose literary estates hold rights split across multiple parties; how now defunct or bankrupt publishers will be addressed; how the large number of foreign rightsholders who are now included in this class will be reached or how the foreign rights collectives that manage their works will be integrated into such a notification process; or how rightsholders that hold exclusive rights in reproduction of portions of books (e.g., chapters, images, or other inserts) will be addressed. Across a corpus of 7 million books, these issues are likely to be rampant. If the court is interested in accurately resolving the rights issues, it will take a herculean effort to actually track down these rightsholders, much less get them to meaningfully engage with this suit (e.g., by sending out their own notices to other people such as co-authors or the like who the author knows may have an interest in the work) as the court envisions.

Problems with common issues and adequate representation

To certify a class, the court must conclude, among other things, that “questions of law or fact common to class members predominate over any questions affecting only individual members,” which includes assessing “the class members’ interests in individually controlling the prosecution or defense of separate actions.” The law also requires that class members be able to “fairly and adequately protect the interests of the class.”

Like its assessment on identifying the class, the court rushes through without rigorous analysis the problem of class members have different and sometimes conflicting legal claims to the same work. For example, on the problem of rightsholders disputing who actually owns rights in their book—often a dispute that would arise as between a publisher and author, the court simply states that “If disputes arise over ownership, which will be unlikely, the district court or as needed a jury will resolve them.”

Who gets to control and authorize reproduction for AI purposes is hotly debated and many publishing contracts, especially those older than just a few years, couldn’t have anticipated—and are ambiguous on this point. Determining this issue could be the difference between large payouts (both in this case and the many others currently pending) or nothing at all for many authors. But many of these contracts are individually negotiated with unique language, in a particular context. This makes it difficult if not impossible for the court to resolve them en masse. If the court truly believes this issue can be resolved by it or a jury, we could be headed towards hundreds of mini-trials within just this suit. In addition to being administratively infeasible, this would mean many authors and publishers forced into a legal fight they likely did not want to litigate at this moment, in front of a court that is likely not convenient to them, on an issue for which they have widely differing views.

Beyond disputes about who controls the relevant rights, the class definition sets up an obvious point of contention— the class covers both “all beneficial and legal owners of the exclusive right to reproduce.” What this means is that in many cases, in the same class, for the same work, we’re likely to have at least two interested parties: a publisher who may hold legal exclusive rights, and an author who holds beneficial rights (e.g., royalty rights). The court suggests that the likelihood of conflict is low: “This is because authors and publishers are in business together and will work out the best way to recover.”

Judge Alsup has a rosier view of author-publisher relationships than we do. We think the potential for conflict is very real. Where this may be most prevalent is with authors who support AI development and want to see Anthropic and other companies build new tools and products that could benefit them. As Anthropic pointed out “academics, researchers, and writers who use [LLMs] disagree with the position taken by plaintiffs and actively use and benefit from LLMs like Claude.” Claude has 18.9 million monthly active users, and we know at least some of those include authors who research and write using this tool. Those authors and others who support Anthropic’s assertion of fair use (e.g., because they rely on the same fair use rationale for creating new works) would have very different incentives than, for example, a for-profit publisher who prioritizes monetizing existing holdings over generating new works that could dilute their market.

The decision also fails to touch on the many other ways that class-wide resolution is impractical — for example, many works in these datasets are openly licensed under Creative Commons licenses. Anthropic, like any user of these data sets, should be entitled to present work-by-work evidence that they have permission for those works. Many works are also likely to be substantially in the public domain (e.g., public domain books with an ISBN and registered with the copyright office only due to the inclusion of a short copyrighted preface).

Conclusion

In certifying this class, Judge Alsup has set in motion a high-stakes, complex legal process that risks entangling millions of authors in a deeply flawed and unworkable system. While the decision dramatically raises the pressure on Anthropic to settle, it does so by sweeping aside decades of hard-won insight into the realities of publishing contracts, copyright ownership, and the difficulty of identifying rightsholders at scale. Authors, publishers, literary estates, and many others (many of whom never asked to be part of this fight) now find themselves represented by three writers —without meaningful input, and possibly without shared interests. As this litigation moves forward, it’s critical for the broader author community to pay close attention, speak up, and advocate for a process that reflects the messy realities of authorship, not just legal expedience.

Discover more from Authors Alliance

Subscribe to get the latest posts sent to your email.

Bartz v. Anthropic: Judge Alsup Certifies Class for Rightsholders of 7 Million Books Used by Anthropic

Do we even know what books we are talking about?

No realistic plan for identifying rightsholders

Problems with common issues and adequate representation

Conclusion

Discover more from Authors Alliance

Leave a Comment

2 thoughts on “Bartz v. Anthropic: Judge Alsup Certifies Class for Rightsholders of 7 Million Books Used by Anthropic”

Do we even know what books we are talking about?

No realistic plan for identifying rightsholders

Problems with common issues and adequate representation

Conclusion

Share this:

Discover more from Authors Alliance

Related Posts

Leave a Comment

2 thoughts on “Bartz v. Anthropic: Judge Alsup Certifies Class for Rightsholders of 7 Million Books Used by Anthropic”