
In an earlier post, we shared details from Judge Alsup’s decision on Anthropic’s motion for summary judgment in Bartz v. Anthropic, the first decision from one of many cases examining whether using in-copyright works to train LLMs constitutes fair use.
We’ve also shared information related to Judge Alsup’s recent decision to certify a class representing authors and rightsholders of nearly 7 million works. If you are a rightsholder in one or more works, it is quite possible that you could be part of this class. We think it’s important for you to begin to evaluate whether you would wish to be a part of the class (you will be able to opt out) and whether this litigation represents your interests.
As we highlighted before, the court found that the use of in copyright works for LLM training was a “quintessentially transformative” fair use. But the court declined to find that Anthropic’s download of pirated copies to build a “Central Library” was a fair use– that issue must go to trial.
As promised, here’s additional analysis of the decision, including some speculation as to where things might go from here. For additional reading on this case, we would highly recommend Professor Matthew Sag’s Anthropic’s multi-billion dollar loss in Bartz v. Anthropic is really a win (for AI) and the extensive coverage found in Chat GPT is Eating the World.
Procedural note: a reminder that the decision is in response to Anthropic’s motion for summary judgment
It’s important to remember, procedurally, where we’re at with this case: When a party (Anthropic) moves for summary judgment, any disputed facts are generally construed in a light most favorable to the non-moving party (Bartz and her fellow plaintiffs).
“To prevail on summary judgment, Anthropic must rely on undisputed facts and/or factual inferences favoring the opposing side.” (Bartz at 8)
While there has been substantial discovery in this case, some key facts are still either unknown or in dispute. We’ll try our best to identify any unknowns below, but it’s good to keep in mind that some of the facts at this stage may be better characterized as inferences drawn by the court to favor the plaintiff authors.
Judge Alsup’s statements and the structure of the decision appear to be aiming to bring the parties to settlement
As we observed in an earlier post regarding the possibility of a settlement, Judge Alsup has been uncharacteristically open to seeing the parties in this case settle. We wrote:
In a notably rare move, he granted “permission to negotiate class settlement” before class certification is resolved (something he ordinarily prohibits, though other judges allow), even telling the parties he had “invited the parties to settle the case, and sooner rather than later.”
Judge Alsup wrote, “We are far enough along that any settlement should come sooner rather than later if the parties prefer that the undersigned judge manage the class approval process, given that the judge will likely take inactive status before the end of the year.”
One way to read Judge Alsup’s current decision is as a robust continuation of this encouragement for the parties to actively pursue settlement. Due to the structure of the order, now both parties risk substantial loss at later stages of litigation — if Anthropic does not settle, it faces the possibility of billions of dollars in damages. Professor Matthew Sag has conservatively estimated potential damages over 2 billion dollars – “statutory damages for this infringement could total $2.25 billion (calculated at $750 per work).” Damages are difficult to predict at this time, but there is a considerable risk that total damages could be much higher.
Given that Judge Alsup peppers the order with references to bad faith and its effect on damages (“Of course, if infringement is found, bad faith would matter for determining willfulness” (Bartz at 19)…”no damages from pirating copies could be undone by later paying for copies of the same works” (Bartz at 24)), the specter of tens of billions of dollars in damages looms large; Anthropic must be strongly weighing the advantages of settlement.
If Anthropic is able to reach a settlement with the plaintiffs, it will have successfully addressed its exposure to outsized damages for willful copyright infringement, secured a win for using in copyright works to train LLMs, and come away with a clearer roadmap for how it and other AI researchers might approach future training activities. For a company currently valued at 61.5 billion dollars (to be clear, this valuation does not mean that Anthropic has, or could easily secure, that amount of money), it would not be shocking to imagine that a settlement might feel like a savvy business move. It would put Anthropic in a more stable legal position than many of its primary competitors, which might be attractive to future investors [Note: Here, I need to also acknowledge a counterargument – if Anthropic settles, it may encounter future litigation from groups not represented by this class. After all, this class does not represent all rightsholders implicated in Anthropic’s training activities. One large settlement may simply look like the tip of a very large iceberg].
The plaintiff authors and their attorneys also have something substantial to lose – with this decision, they are currently at one of their best negotiating points – even if they were to win at trial and secure outsized damages for willful infringement, Anthropic could draw this litigation out for years through appeals. In those intervening years, the legal landscape may become more accommodating to the method by which Anthropic amassed its training corpus and Anthropic may be able to successfully whittle down the number of works that fit into Judge Alsup’s “Central Library,” a useful device for the opinion but one where more concrete details and numbers are currently opaque.
Whether a settlement is a positive development for all authors and for the public interest is a separate question and one we will continue to look at closely in the coming months. At minimum, a settlement risks consolidating a lot of market power in the hands of a single AI company. We’ll share more here as we learn more.
We don’t know precisely what works have been used to train the LLMs.
Central to this decision is the notion that we can cleanly distinguish two groupings of works used by Anthropic: (1) “Copies used to Build a Central Library” and (2) “Copies used to train LLMs.”
Conceptually, Anthropic’s “Central Library” is the collection of all in copyright works it had acquired, either through purchase and digitization or through other methods, including illegal download from “shadow libraries.” “Anthropic kept the library copies in place as a permanent, general-purpose resource even after deciding it would not use certain copies to train LLMs or would never use them again to do so.” (Bartz at 2).
However, if we read the order closely, we see that these groupings are not so clearly defined based on facts currently available to the court:
“— and yet Anthropic has resisted putting into the record what copies or even sets of copies were in fact used for training LLMs.”…“All deficiencies must be held against Anthropic and not the other way around.” (Bartz at 9)
If we don’t know what works were actually used to train LLMs, then the Central Library/Copies used to train LLMs dichotomy breaks down. Presumably, Anthropic is in a position to clearly delineate between the two and they would be compelled to do so at trial. For now, with information that is currently in their hands, they are in the best position to evaluate the extent of their potential liability.
What does this mean? First, it means that the number of works that are ultimately found to be in the Central Library and not used for training LLMs (and thus potentially not a fair use) could be a far smaller number than we are currently estimating. It’s hard to tell.
We think this could also be a point of dispute in future stages of litigation, should a settlement not succeed – if an unauthorized work is collected for possible future training of an AI, but that use is not immediately clearly defined, should that preclude fair use in all instances? Clearly, that is a possibility left open by this district court but it also looks like a standard that could be easily gamed by the industry (e.g., always have an LLM model that trains on everything). It’s a standard that could create a degree of inflexibility that would be at odds with the progress oriented purposes of copyright law.
From the order, we can glean some lessons and lingering questions for future AI training and research
- If you’re an AI researcher, try to avoid viewing the law as “legal/practice/business slog.”
Silicon Valley has been operating under some variation of a “Move fast and break things” culture for quite some time. We can see it in this case. We can see it in Kadrey v. Meta, which also involves the use of millions of works from unauthorized sources. In reading through this decision, we can see Judge Alsup’s exasperation with this approach – he mentions Anthropic’s desire to avoid “legal/practice/business slog” multiple times in the order and it shapes his views regarding the “Central Library” (“Pirating copies to build a research library without paying for it, and to retain copies should they prove useful for one thing or another, was its own use — and not a transformative one.” (Bartz at 19))
Judge Alsup articulates the contours of a better approach – at minimum, he thinks AI researchers should lawfully acquire the works used to train AI. As one viable method, he favors purchase of physical copies of books (“Anthropic spent many millions of dollars to purchase millions of print books, often in used condition.”(Bartz at 4)) He favors security measures designed to prevent future unauthorized uses (“The university libraries and Google went to exceedingly great lengths to ensure that all copies were secured against unauthorized uses — both through technical measures and through legal agreements among all participants. Not so here.” (Bartz at 22)). He prefers clearly defined uses for works, as opposed to poorly defined aggregation with no clear intention.
If we are to view this district court opinion as a prescription for what AI researchers should do to minimize their legal risk, we would likely recommend, at minimum: (1) use authorized versions of works, even used copies, whenever possible; (2) institute security measures to restrict access to and unauthorized uses of the works; (3) avoid the use of unauthorized versions of works but, if this is necessary, use no more than is required for your purpose, clearly define and justify the use for that purpose, and plan to retain those works for no longer than is absolutely necessary.
While such prescriptions may be quite useful for Anthropic, Google, OpenAI, or similar companies moving forward, it seems to us that they have more limited utility for the rest of us. In the context of AI training, the prescriptions above would be quite onerous for any AI researchers beyond incredibly well capitalized for-profit companies. Put simply, purchasing millions of dollars worth of books, disbinding, scanning and storing them is not a realistic possibility for a small team of academic researchers. Just putting your hands on actual copies of many works can be very difficult, particularly at a large scale. To the extent that we want to see an ecosystem that supports this work outside of the for profit/massive corporate context, we will need something more than this decision provides.
Additional observations/Notes from the decision
The following is not meant to be exhaustive (there’s a lot to digest in the decision), but we would like to highlight two more noteworthy things:
- The court makes quick work of the “market dilution/market obliteration” theory that is deeply concerning to Judge Chhabria in Kadrey v. Meta.
While Judge Chhabria has been intensely focused on the capacity for AI to destroy the markets for those authors’ works used for training, Judge Alsup is not particularly concerned or swayed by this possibility:
“Authors contend generically that training LLMs will result in an explosion of works competing with their works — such as by creating alternative summaries of factual events, alternative examples of compelling writing about fictional events, and so on. This order assumes that is so. But Authors’ complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works. This is not the kind of competitive or creative displacement that concerns the Copyright Act. The Act seeks to advance original works of authorship, not to protect authors against competition.” (Bartz at 28)
Judge Alsup is right to observe that the purpose of the Copyright Act is not to protect authors from the competition of all future creative works. We expect this debate will continue to play out in other cases, with Chhabria and Alsup representing two opposing facets of the issue.
I would like to present some additional counterpoints to the idea that AI’s creative output will obliterate markets. My position is that we are already so enormously saturated in books, music, movies, and other forms of media that, to the extent obliteration is possible, it has already happened – we are living in the aftermath of market obliteration.
Consider: As of April 2025, over 20 billion videos have been uploaded to YouTube alone, with approximately 20 million videos uploaded daily. UNESCO estimates that approximately 2.2 million books are published each year. A voracious reader, reading a book a day for 50 years, would read less than 20,000 of those books. We live in an unknowable abundance of images, texts, and sounds – more than we can ever hope to engage – and yet we persist and find our own individualized ways to navigate it. AI will certainly shape this abundance, but Judge Alsup’s skepticism may simply be hard earned wisdom.
- LLM outputs continue to be a live issue
Finally, while Judge Alsup has found that AI training is a fair use, this case does not stand for the principle that all AI outputs are fair use or not infringing. That is very much a live issue (e.g., Disney v. Midjourney is currently focused specifically on outputs) and one that will likely arise with increasing frequency in the coming years – we anticipate that generative AI will produce many substantially similar works, whether intended by users or not. For now, that is not a question before this court:
“Here, if the outputs seen by users had been infringing, Authors would have a different case. And, if the outputs were ever to become infringing, Authors could bring such a case. But that is not this case.” (Bartz at 12)
Discover more from Authors Alliance
Subscribe to get the latest posts sent to your email.